KNN(K nearest neighbor) 알고리즘으로 새로운 데이터 카테고리 분류하기

ML (MachineLearning)

KNN(K nearest neighbor) 알고리즘으로 새로운 데이터 카테고리 분류하기

567Rabbit 2024. 4. 15. 15:27

머신러닝의 지도학습에 속하는

Classfication(분류)

- Logistic Regression (로지스틱 회귀)

- KNN(K nearest neighbor) 알고리즘,

- SVC(Support Vector Machine) 알고리즘,

- DT(Decision Tree) 알고리즘

네 가지 방법 중에 정확도가 더 높은 방법으로 알고리즘을 선택하여 사용한다

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

	User ID	Gender	Age	EstimatedSalary	Purchased
0	15624510	Male	19	19000	0
1	15810944	Male	35	20000	0
2	15668575	Female	26	43000	0
3	15603246	Female	27	57000	0
4	15804002	Male	19	76000	0
...	...	...	...	...	...
395	15691863	Female	46	41000	1
396	15706071	Male	51	23000	1
397	15654296	Female	50	20000	1
398	15755018	Male	36	33000	0
399	15594041	Female	49	36000	1

구매 한다 : 1

구매 안한다 : 0

어느쪽에 가까울지 카테고리하기

특성열과 대상열로 나누기

특성 열(X)은 데이터셋에서 각각의 관측치에 대한 설명변수를 나타낸다
대상 열(y)은 예측하려는 값이 포함된 열이다

y = df['Purchased']

X = df.loc[ : , 'Age' : 'EstimatedSalary']

피처스케일링

from sklearn.preprocessing import StandardScaler

X_scaler = StandardScaler()

X = X_scaler.fit_transform(X)

train과 test로 나누기

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)

모델링하기

from sklearn.neighbors import KNeighborsClassifier

classifier = KNeighborsClassifier(n_neighbors=3)

classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)

confusion matrix 구하기

from sklearn.metrics import confusion_matrix, accuracy_score

confusion_matrix(y_test, y_pred)

array([[50,  8],
       [ 3, 39]], dtype=int64)

정확도 구하기

cm = confusion_matrix(y_test, y_pred)

(50+39) / cm.sum()

0.89

저작자표시 비영리 변경금지 (새창열림)

'ML (MachineLearning)' 카테고리의 다른 글

DTree(Decision Tree) 알고리즘으로 새로운 데이터 카테고리 분류하기 (0)	2024.04.15
SVM(Support Vector Machine) 알고리즘으로 새로운 데이터 카테고리 분류하기 (0)	2024.04.15
데이터 불균형이 발생할 때, 데이터 리샘플링하기 (0)	2024.04.15
Logistic Regression (로지스틱 회귀) (0)	2024.04.15
Linear regression을 사용하여 신규 데이터 입력 시, 데이터 기반 예측 값 알려주기 (5)	2024.04.15

현재글KNN(K nearest neighbor) 알고리즘으로 새로운 데이터 카테고리 분류하기

Rabbit's efficient coding 🖥️🐇 & 금융