DTree(Decision Tree) 알고리즘으로 새로운 데이터 카테고리 분류하기

ML (MachineLearning)

DTree(Decision Tree) 알고리즘으로 새로운 데이터 카테고리 분류하기

567Rabbit 2024. 4. 15. 17:25

머신러닝의 지도학습에 속하는

Classfication(분류)

- Logistic Regression (로지스틱 회귀)

- KNN(K nearest neighbor) 알고리즘,

- SVC(Support Vector Machine) 알고리즘,

- DT(Decision Tree) 알고리즘

네 가지 방법 중에 정확도가 더 높은 방법으로 알고리즘을 선택하여 사용한다

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

DT(Decision Tree)

의사결정 나무는 흐름도이며 이전 경험을 바탕으로 의사결정을 내리는 데 도움을 주는 것이다.

	User ID	Gender	Age	EstimatedSalary	Purchased
0	15624510	Male	19	19000	0
1	15810944	Male	35	20000	0
2	15668575	Female	26	43000	0
3	15603246	Female	27	57000	0
4	15804002	Male	19	76000	0
...	...	...	...	...	...
395	15691863	Female	46	41000	1
396	15706071	Male	51	23000	1
397	15654296	Female	50	20000	1
398	15755018	Male	36	33000	0
399	15594041	Female	49	36000	1

구매 한다 : 1

구매 안한다 : 0

어느쪽에 가까울지 카테고리하기

특성열과 대상열로 나누기

특성 열(X)은 데이터셋에서 각각의 관측치에 대한 설명변수를 나타낸다
대상 열(y)은 예측하려는 값이 포함된 열이다

y = df['Purchased']

X = df.loc[ : , 'Age' : 'EstimatedSalary']

피처스케일링

from sklearn.preprocessing import StandardScaler

X_scaler = StandardScaler()

X = X_scaler.fit_transform(X)

train과 test로 나누기

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)

모델링하기

from sklearn.tree import DecisionTreeClassifier

classifier = DecisionTreeClassifier(random_state=1)

classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)

confusion matrix

from sklearn.metrics import confusion_matrix, accuracy_score

confusion_matrix(y_test, y_pred)


array([[50,  8],
[ 8, 34]], dtype=int64)

accuracy_score(y_test, y_pred)


0.84

저작자표시 비영리 변경금지 (새창열림)

'ML (MachineLearning)' 카테고리의 다른 글

하이라키 클러스터링(Hierarchical Clustering) : 계층적 군집 (0)	2024.04.16
K-Means 알고리즘 (0)	2024.04.16
SVM(Support Vector Machine) 알고리즘으로 새로운 데이터 카테고리 분류하기 (0)	2024.04.15
KNN(K nearest neighbor) 알고리즘으로 새로운 데이터 카테고리 분류하기 (0)	2024.04.15
데이터 불균형이 발생할 때, 데이터 리샘플링하기 (0)	2024.04.15

현재글DTree(Decision Tree) 알고리즘으로 새로운 데이터 카테고리 분류하기

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Rabbit's efficient coding 🖥️🐇