ML (MachineLearning)

DTree(Decision Tree) ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ์นดํ…Œ๊ณ ๋ฆฌ ๋ถ„๋ฅ˜ํ•˜๊ธฐ

567Rabbit 2024. 4. 15. 17:25

 

๋จธ์‹ ๋Ÿฌ๋‹์˜ ์ง€๋„ํ•™์Šต์— ์†ํ•˜๋Š”

 

 

Classfication(๋ถ„๋ฅ˜)

- Logistic Regression (๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€)

- KNN(K nearest neighbor) ์•Œ๊ณ ๋ฆฌ์ฆ˜, 

- SVC(Support Vector Machine) ์•Œ๊ณ ๋ฆฌ์ฆ˜,

- DT(Decision Tree) ์•Œ๊ณ ๋ฆฌ์ฆ˜

 

๋„ค ๊ฐ€์ง€ ๋ฐฉ๋ฒ• ์ค‘์— ์ •ํ™•๋„๊ฐ€ ๋” ๋†’์€ ๋ฐฉ๋ฒ•์œผ๋กœ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์„ ํƒํ•˜์—ฌ ์‚ฌ์šฉํ•œ๋‹ค

 

 

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

 

 

DT(Decision Tree)

์˜์‚ฌ๊ฒฐ์ • ๋‚˜๋ฌด๋Š” ํ๋ฆ„๋„์ด๋ฉฐ ์ด์ „ ๊ฒฝํ—˜์„ ๋ฐ”ํƒ•์œผ๋กœ ์˜์‚ฌ๊ฒฐ์ •์„ ๋‚ด๋ฆฌ๋Š” ๋ฐ ๋„์›€์„ ์ฃผ๋Š” ๊ฒƒ์ด๋‹ค.

 

 

df

  User ID Gender Age EstimatedSalary Purchased
0 15624510 Male 19 19000 0
1 15810944 Male 35 20000 0
2 15668575 Female 26 43000 0
3 15603246 Female 27 57000 0
4 15804002 Male 19 76000 0
... ... ... ... ... ...
395 15691863 Female 46 41000 1
396 15706071 Male 51 23000 1
397 15654296 Female 50 20000 1
398 15755018 Male 36 33000 0
399 15594041 Female 49 36000 1

 

๊ตฌ๋งค ํ•œ๋‹ค : 1

๊ตฌ๋งค ์•ˆํ•œ๋‹ค : 0

 

 

์–ด๋А์ชฝ์— ๊ฐ€๊นŒ์šธ์ง€ ์นดํ…Œ๊ณ ๋ฆฌํ•˜๊ธฐ

 

 

 

 

ํŠน์„ฑ์—ด๊ณผ ๋Œ€์ƒ์—ด๋กœ ๋‚˜๋ˆ„๊ธฐ

 

ํŠน์„ฑ ์—ด(X)์€ ๋ฐ์ดํ„ฐ์…‹์—์„œ ๊ฐ๊ฐ์˜ ๊ด€์ธก์น˜์— ๋Œ€ํ•œ ์„ค๋ช…๋ณ€์ˆ˜๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค
๋Œ€์ƒ ์—ด(y)์€ ์˜ˆ์ธกํ•˜๋ ค๋Š” ๊ฐ’์ด ํฌํ•จ๋œ ์—ด์ด๋‹ค

 

y = df['Purchased']

 

X = df.loc[ : , 'Age' : 'EstimatedSalary']

 

 

 

 

 

ํ”ผ์ฒ˜์Šค์ผ€์ผ๋ง

 

from sklearn.preprocessing import StandardScaler

 

X_scaler = StandardScaler()

 

X = X_scaler.fit_transform(X)

 

 

 

 

train๊ณผ test๋กœ ๋‚˜๋ˆ„๊ธฐ

 

from sklearn.model_selection import train_test_split

 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)

 

 

 

 

๋ชจ๋ธ๋งํ•˜๊ธฐ

from sklearn.tree import DecisionTreeClassifier

 

classifier = DecisionTreeClassifier(random_state=1)

 

classifier.fit(X_train, y_train)

 

 

y_pred = classifier.predict(X_test)

 

 

 

 

 

confusion matrix

 

from sklearn.metrics import confusion_matrix, accuracy_score

 

confusion_matrix(y_test, y_pred)

array([[50,  8],
       [ 8, 34]], dtype=int64)

 

 

accuracy_score(y_test, y_pred)

0.84