ML (MachineLearning) 17

DTree(Decision Tree) ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ์นดํ…Œ๊ณ ๋ฆฌ ๋ถ„๋ฅ˜ํ•˜๊ธฐ

๋จธ์‹ ๋Ÿฌ๋‹์˜ ์ง€๋„ํ•™์Šต์— ์†ํ•˜๋Š” Classfication(๋ถ„๋ฅ˜) - Logistic Regression (๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€) - KNN(K nearest neighbor) ์•Œ๊ณ ๋ฆฌ์ฆ˜, - SVC(Support Vector Machine) ์•Œ๊ณ ๋ฆฌ์ฆ˜, - DT(Decision Tree) ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋„ค ๊ฐ€์ง€ ๋ฐฉ๋ฒ• ์ค‘์— ์ •ํ™•๋„๊ฐ€ ๋” ๋†’์€ ๋ฐฉ๋ฒ•์œผ๋กœ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์„ ํƒํ•˜์—ฌ ์‚ฌ์šฉํ•œ๋‹ค import numpy as np import matplotlib.pyplot as plt import pandas as pd DT(Decision Tree) ์˜์‚ฌ๊ฒฐ์ • ๋‚˜๋ฌด๋Š” ํ๋ฆ„๋„์ด๋ฉฐ ์ด์ „ ๊ฒฝํ—˜์„ ๋ฐ”ํƒ•์œผ๋กœ ์˜์‚ฌ๊ฒฐ์ •์„ ๋‚ด๋ฆฌ๋Š” ๋ฐ ๋„์›€์„ ์ฃผ๋Š” ๊ฒƒ์ด๋‹ค. df User ID Gender Age EstimatedSalary Purchased 0 1562451..

SVM(Support Vector Machine) ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ์นดํ…Œ๊ณ ๋ฆฌ ๋ถ„๋ฅ˜ํ•˜๊ธฐ

๋จธ์‹ ๋Ÿฌ๋‹์˜ ์ง€๋„ํ•™์Šต์— ์†ํ•˜๋Š” Classfication(๋ถ„๋ฅ˜) - Logistic Regression (๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€) - KNN(K nearest neighbor) ์•Œ๊ณ ๋ฆฌ์ฆ˜, - SVC(Support Vector Machine) ์•Œ๊ณ ๋ฆฌ์ฆ˜, - DT(Decision Tree) ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋„ค ๊ฐ€์ง€ ๋ฐฉ๋ฒ• ์ค‘์— ์ •ํ™•๋„๊ฐ€ ๋” ๋†’์€ ๋ฐฉ๋ฒ•์œผ๋กœ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์„ ํƒํ•˜์—ฌ ์‚ฌ์šฉํ•œ๋‹ค SVM(Support Vector Machine) SVC (Support Vector Classifier): SVC๋Š” ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ SVM์˜ ๋ณ€ํ˜•์ด๋‹ค ์ด๊ฒƒ์€ ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„๋ฅ˜ํ•˜๊ธฐ ์œ„ํ•ด ์ตœ์ ์˜ ๋ถ„๋ฆฌ ์ดˆํ‰๋ฉด์„ ์ฐพ๋Š”๋‹ค SVC๋Š” ํด๋ž˜์Šค ๊ฐ„์˜ ๊ฒฝ๊ณ„๋ฅผ ๋ถ„๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์ตœ์ ์˜ ์ดˆํ‰๋ฉด์„ ์ฐพ๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ SVR (Support Vector Regress..

KNN(K nearest neighbor) ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ์นดํ…Œ๊ณ ๋ฆฌ ๋ถ„๋ฅ˜ํ•˜๊ธฐ

๋จธ์‹ ๋Ÿฌ๋‹์˜ ์ง€๋„ํ•™์Šต์— ์†ํ•˜๋Š” Classfication(๋ถ„๋ฅ˜) - Logistic Regression (๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€) - KNN(K nearest neighbor) ์•Œ๊ณ ๋ฆฌ์ฆ˜, - SVC(Support Vector Machine) ์•Œ๊ณ ๋ฆฌ์ฆ˜, - DT(Decision Tree) ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋„ค ๊ฐ€์ง€ ๋ฐฉ๋ฒ• ์ค‘์— ์ •ํ™•๋„๊ฐ€ ๋” ๋†’์€ ๋ฐฉ๋ฒ•์œผ๋กœ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์„ ํƒํ•˜์—ฌ ์‚ฌ์šฉํ•œ๋‹ค import numpy as np import matplotlib.pyplot as plt import pandas as pd df User ID Gender Age EstimatedSalary Purchased 0 15624510 Male 19 19000 0 1 15810944 Male 35 20000 0 2 15668575 Female 26 430..

๋ฐ์ดํ„ฐ ๋ถˆ๊ท ํ˜•์ด ๋ฐœ์ƒํ•  ๋•Œ, ๋ฐ์ดํ„ฐ ๋ฆฌ์ƒ˜ํ”Œ๋งํ•˜๊ธฐ

import numpy as np import matplotlib.pyplot as plt import pandas as pd import seaborn as sb #๋‹น๋‡จ๋ณ‘์„ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ชจ๋ธ df Preg Plas Pres skin test mass pedi age class 0 6 148 72 35 0 33.6 0.627 50 1 1 1 85 66 29 0 26.6 0.351 31 0 2 8 183 64 0 0 23.3 0.672 32 1 3 1 89 66 23 94 28.1 0.167 21 0 4 0 137 40 35 168 43.1 2.288 33 1 ... ... ... ... ... ... ... ... ... ... 763 10 101 76 48 180 32.9 0.171 63 0 764 2 1..

Logistic Regression (๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€)

๋จธ์‹ ๋Ÿฌ๋‹์˜ ์ง€๋„ํ•™์Šต์— ์†ํ•˜๋Š” Classfication(๋ถ„๋ฅ˜) - Logistic Regression (๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€) - KNN(K nearest neighbor) ์•Œ๊ณ ๋ฆฌ์ฆ˜, - SVC(Support Vector Machine) ์•Œ๊ณ ๋ฆฌ์ฆ˜, - DT(Decision Tree) ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋„ค ๊ฐ€์ง€ ๋ฐฉ๋ฒ• ์ค‘์— ์ •ํ™•๋„๊ฐ€ ๋” ๋†’์€ ๋ฐฉ๋ฒ•์œผ๋กœ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์„ ํƒํ•˜์—ฌ ์‚ฌ์šฉํ•œ๋‹ค ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋ž€ ? - ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š”๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค - ์—ฐ์†์ ์ธ ๊ฒฐ๊ณผ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ์„ ํ˜•ํšŒ๊ท€์™€ ๋‹ค๋ฅด๊ฒŒ ๋ฒ”์ฃผํ˜• ๊ฒฐ๊ณผ๋ฅผ ์˜ˆ์ธกํ•˜์—ฌ ์ด๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค - ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์ด ์ดํ•ญ์ด๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ๋‘๊ฐ€์ง€ ๊ฒฐ๊ณผ๊ฐ€ ์žˆ๋Š”๋ฐ ๊ทธ ์˜ˆ๋กœ ์•”์ด ์•…์„ฑ์ธ์ง€ ์–‘์„ฑ์ธ์ง€ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค Confusion Matrix ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ, C๊ฐ€ ๋” ์ค‘์š”ํ•˜๋‹ค ์˜คํƒ์ง€์˜ ๊ฒฝ์šฐ ๋งž์€๊ฒƒ์„ ๋ชป์ฐพ..

Linear regression์„ ์‚ฌ์šฉํ•˜์—ฌ ์‹ ๊ทœ ๋ฐ์ดํ„ฐ ์ž…๋ ฅ ์‹œ, ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ์˜ˆ์ธก ๊ฐ’ ์•Œ๋ ค์ฃผ๊ธฐ

Simple linear regression ํ•˜๋‚˜์˜ ๋ณ€์ˆ˜๋กœ X -> y๋ฅผ ์•Œ์•„๋‚ธ๋‹ค Multiple linear regression ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋ณ€์ˆ˜๋กœ X1, X2, X3 ... -> y๋ฅผ ์•Œ์•„๋‚ธ๋‹ค ์—ฌ๊ธฐ์„œ๋Š” Multiple linear regression๋กœ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ์˜ˆ์ธก๊ฐ’์„ ์•Œ์•„๋ณด๋„๋ก ํ•˜๊ฒ ๋‹ค. import numpy as np import matplotlib.pyplot as plt import pandas as pd ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ Profit (์ˆ˜์ต)์„ ์˜ˆ์ธกํ•˜๋ ค ํ•œ๋‹ค. df R&D Spend Administration Marketing Spend State Profit 0 165349.20 136897.80 471784.10 New York 192261.83 1 162597.70 151377..

Regressor(ํšŒ๊ท€๋ชจ๋ธ) ์ƒ์„ฑํ•˜๊ณ , MSE(ํ‰๊ท ์ œ๊ณฑ์˜ค์ฐจ)๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•

Prediction - regressor (ํšŒ๊ท€๋ชจ๋ธ) ๋จธ์‹ ๋Ÿฌ๋‹ regressor๋Š” ์ง€๋„(ํ•™์Šต)๋ชจ๋ธ๋กœ ๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅ์‹œ์ผœ์ฃผ๋ฉด ๊ทธ์— ๋Œ€ํ•œ ์˜ˆ์ธก๊ฐ’์„ ์•Œ๋ ค์ฃผ๋Š” ๋ชจ๋ธ์ด๋‹ค. ํšŒ๊ท€ ๋ชจ๋ธ์€ ์ฃผ๋กœ ์˜ˆ์ธกํ•˜๋ ค๋Š” ๊ฐ’์ด ์—ฐ์†ํ˜• ๋ฐ์ดํ„ฐ์ธ ๊ฒฝ์šฐ์— ์‚ฌ์šฉ๋˜๋ฉฐ ํŠน์ •ํ•œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ํŒจํ„ด์„ ํ•™์Šตํ•˜๊ณ , ๊ทธ ํŒจํ„ด์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒˆ๋กœ์šด ์ž…๋ ฅ์— ๋Œ€ํ•œ ๊ฐ’์„ ์˜ˆ์ธกํ•œ๋‹ค import numpy as np import matplotlib.pyplot as plt import pandas as pd ๋ฌธ์ œ ) ๋จธ์‹ ๋Ÿฌ๋‹์„ ํ†ตํ•ด Purchased ๊ฐ’์„ ์•Œ๊ณ ์‹ถ๋‹ค. df Country Age Salary Purchased 0 France 44.000000 72000.000000 No 1 Spain 27.000000 48000.000000 Yes 2 Germany ..

1 2