๋จธ์ ๋ฌ๋์ ์ง๋ํ์ต์ ์ํ๋
Classfication(๋ถ๋ฅ)
- Logistic Regression (๋ก์ง์คํฑ ํ๊ท)
- KNN(K nearest neighbor) ์๊ณ ๋ฆฌ์ฆ,
- SVC(Support Vector Machine) ์๊ณ ๋ฆฌ์ฆ,
- DT(Decision Tree) ์๊ณ ๋ฆฌ์ฆ
๋ค ๊ฐ์ง ๋ฐฉ๋ฒ ์ค์ ์ ํ๋๊ฐ ๋ ๋์ ๋ฐฉ๋ฒ์ผ๋ก ์๊ณ ๋ฆฌ์ฆ์ ์ ํํ์ฌ ์ฌ์ฉํ๋ค
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
DT(Decision Tree)
์์ฌ๊ฒฐ์ ๋๋ฌด๋ ํ๋ฆ๋์ด๋ฉฐ ์ด์ ๊ฒฝํ์ ๋ฐํ์ผ๋ก ์์ฌ๊ฒฐ์ ์ ๋ด๋ฆฌ๋ ๋ฐ ๋์์ ์ฃผ๋ ๊ฒ์ด๋ค.
df
User ID | Gender | Age | EstimatedSalary | Purchased | |
0 | 15624510 | Male | 19 | 19000 | 0 |
1 | 15810944 | Male | 35 | 20000 | 0 |
2 | 15668575 | Female | 26 | 43000 | 0 |
3 | 15603246 | Female | 27 | 57000 | 0 |
4 | 15804002 | Male | 19 | 76000 | 0 |
... | ... | ... | ... | ... | ... |
395 | 15691863 | Female | 46 | 41000 | 1 |
396 | 15706071 | Male | 51 | 23000 | 1 |
397 | 15654296 | Female | 50 | 20000 | 1 |
398 | 15755018 | Male | 36 | 33000 | 0 |
399 | 15594041 | Female | 49 | 36000 | 1 |
๊ตฌ๋งค ํ๋ค : 1
๊ตฌ๋งค ์ํ๋ค : 0
์ด๋์ชฝ์ ๊ฐ๊น์ธ์ง ์นดํ ๊ณ ๋ฆฌํ๊ธฐ
ํน์ฑ์ด๊ณผ ๋์์ด๋ก ๋๋๊ธฐ
ํน์ฑ ์ด(X)์ ๋ฐ์ดํฐ์
์์ ๊ฐ๊ฐ์ ๊ด์ธก์น์ ๋ํ ์ค๋ช
๋ณ์๋ฅผ ๋ํ๋ธ๋ค
๋์ ์ด(y)์ ์์ธกํ๋ ค๋ ๊ฐ์ด ํฌํจ๋ ์ด์ด๋ค
y = df['Purchased']
X = df.loc[ : , 'Age' : 'EstimatedSalary']
ํผ์ฒ์ค์ผ์ผ๋ง
from sklearn.preprocessing import StandardScaler
X_scaler = StandardScaler()
X = X_scaler.fit_transform(X)
train๊ณผ test๋ก ๋๋๊ธฐ
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)
๋ชจ๋ธ๋งํ๊ธฐ
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(random_state=1)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
confusion matrix
from sklearn.metrics import confusion_matrix, accuracy_score
confusion_matrix(y_test, y_pred)
array([[50, 8],
[ 8, 34]], dtype=int64)
accuracy_score(y_test, y_pred)
0.84