Notice
Recent Posts
Recent Comments
ยซ   2025/01   ยป
์ผ ์›” ํ™” ์ˆ˜ ๋ชฉ ๊ธˆ ํ† 
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
Archives
Today
In Total
๊ด€๋ฆฌ ๋ฉ”๋‰ด

A Joyful AI Research Journey๐ŸŒณ๐Ÿ˜Š

[10] 241112 DL, K-Nearest Neighbor (KNN), Decision Tree [Goorm All-In-One Pass! AI Project Master - 4th Session, Day 10] ๋ณธ๋ฌธ

๐ŸŒณAI & Quantum Computing Bootcamp 2024โœจ/AI Lecture Revision

[10] 241112 DL, K-Nearest Neighbor (KNN), Decision Tree [Goorm All-In-One Pass! AI Project Master - 4th Session, Day 10]

yjyuwisely 2024. 11. 12. 13:40

241112 Tue 10th class

์˜ค๋Š˜ ๋ฐฐ์šด ๊ฒƒ ์ค‘ ๊ธฐ์–ตํ•  ๊ฒƒ์„ ์ •๋ฆฌํ–ˆ๋‹ค.


llm pandas ๋งŽ์ด ์“ด๋‹ค. 

pandas ์ž๋™ํ™” 

์ด๋ก , ๋…ผ๋ฌธ์„ ์•Œ์•„์•ผํ•œ๋‹ค.

ํ†ต๊ณ„ 3,4ํ•™๋…„, ๊ณต๋Œ€ ๋Œ€ํ•™์› 

์นด์ด์ŠคํŠธ ๋Œ€ํ•™์› - ๋…ผ๋ฌธ, ์ „๋ฌธ ์šฉ์–ด 


ํƒ€์ดํƒ€๋‹‰ ์บ๊ธ€ ์ดˆ๋ณด์ž 

https://www.kaggle.com/competitions/titanic

 

Titanic - Machine Learning from Disaster

Start here! Predict survival on the Titanic and get familiar with ML basics

www.kaggle.com

 


https://rowan-sail-868.notion.site/46295e260bdf4588b6841eabcde0d01c

 

๋จธ์‹ ๋Ÿฌ๋‹/๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ณธ ๋ฐ ์‹ค์Šต | Notion

Notion ํŒ: ํŽ˜์ด์ง€๋ฅผ ์ƒ์„ฑํ•  ๋•Œ๋Š” ๋ช…ํ™•ํ•œ ์ œ๋ชฉ๊ณผ ๊ด€๋ จ๋œ ๋‚ด์šฉ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ธ์ฆ๋œ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , ํŽ˜์ด์ง€ ์ฃผ์ œ๋ฅผ ํ™•์‹คํžˆ ํ•˜๊ณ , ์ฃผ์š” ์ด์Šˆ์— ๋Œ€ํ•œ ์˜๊ฒฌ์„ ๊ณต์œ ํ•˜์„ธ์š”.

rowan-sail-868.notion.site


https://ldjwj.github.io/ML_Basic_Class/part03_ml/ch02_01_01B_knn_code_pratice_2205.html

 

ch02_01_01_knn_code_pratice

1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C

ldjwj.github.io

http://localhost:8888/notebooks/Documents%2FICT4th%2FML_Code%2F20241112_Lib01_Pandas.ipynb

01 ๋ฐ์ดํ„ฐ ์ค€๋น„

import pandas as pd
from sklearn.model_selection import train_test_split
import numpy as np
dat = pd.read_csv("./data/titanic/train.csv")
dat

dat.info()

X = dat[ ['Pclass' , 'SibSp'] ]
y = dat['Survived']

๋ฐ์ดํ„ฐ ์„ ํƒ ๋ฐ ๋‚˜๋ˆ„๊ธฐ

  • test_size : ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ ์…‹ ๋น„์œจ ์„ ํƒ
  • random_state : ๋ฐ์ดํ„ฐ์„ ๋ฝ‘์„ ๋•Œ, ์ง€์ •๋œ ํŒจํ„ด์œผ๋กœ ์„ ํƒ
# 90% : ํ•™์Šต์šฉ, 10% : ํ…Œ์ŠคํŠธ์šฉ
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.1,
                                                    random_state=0)

X_train.shape, X_test.shape, y_train.shape, y_test.shape

๊ฒฐ๊ณผ)

((801, 2), (90, 2), (801,), (90,))


์ด์ง„ ๋ถ„๋ฅ˜, ์ดํ•ญ ๋ถ„๋ฅ˜


ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ 
ํŠœ๋‹ 
์„ฑ๋Šฅ ๊ฐœ์„ 

(pred == y_test).sum() / len(pred)

๊ฒฐ๊ณผ)

0.5666666666666667
print("ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ ์ •ํ™•๋„ : {:.2f}".format(np.mean(pred == y_test)))

๊ฒฐ๊ณผ)

ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ ์ •ํ™•๋„ : 0.57

๋ชจ๋ธ ์“ฐ๊ธฐ ์ „ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ


๋ ˆ์ด๋ธ” ์ธ์ฝ”๋”ฉ์€ ๋ฌธ์ž์—ด๋กœ ๋œ ๋ฒ”์ฃผํ˜• ๊ฐ’์„ ์ˆซ์žํ˜• ์นดํ…Œ๊ณ ๋ฆฌ ๊ฐ’์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ธ์ฝ”๋”ฉ์€ ๋ฐ์ดํ„ฐ๋ฅผ ํŠน์ •ํ•œ ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์ €์žฅํ•˜๊ฑฐ๋‚˜ ์ „์†กํ•˜๋Š” ๊ณผ์ •์ด๋ฉฐ, ๋””์ฝ”๋”ฉ์€ ์ด๋ฅผ ์›๋ž˜์˜ ํ˜•ํƒœ๋กœ ๋ณต์›ํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค. ๋””์ฝ”๋”ฉ์€ ์ธ์ฝ”๋”ฉ์˜ ๋ฐ˜๋Œ€ ๊ฐœ๋…์œผ๋กœ, ๋ฐ์ดํ„ฐ์˜ ์ ‘๊ทผ์„ฑ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

dat.info()

๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌ ๋ฐ ๋ ˆ์ด๋ธ” ์ธ์ฝ”๋”ฉ

dat.head()

mapํ•จ์ˆ˜

  • [Series].map(ํ•จ์ˆ˜ ๋˜๋Š” ๋ณ€๊ฒฝ๊ฐ’) : Series๋ฅผ ๋Œ€์ƒ์œผ๋กœ ์›ํ•˜๋Š” ํ•จ์ˆ˜ ์ ์šฉ ๋˜๋Š” ๊ฐ’์„ ๋Œ€์ฒด
  • ๊ฐ’์œผ๋กœ dict, Series๋ฅผ ๋Œ€์ƒ์œผ๋กœ ํ•œ๋‹ค.
  • https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html
mapping = { "male":1, 'female':2 }
dat['Sex_num'] = dat['Sex'].map(mapping)
dat.head()

# [].fillna( ) : ๊ฒฐ์ธก๊ฐ’์„ ์ฑ„์šด๋‹ค.
val_mean = dat['Age'].mean()
dat['Age'] = dat['Age'].fillna( val_mean )
dat.info()


ํ•™์Šต, ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ ์…‹ ๋‚˜๋ˆ„๊ธฐ

X = dat[ ['Pclass' , 'SibSp', 'Sex_num', 'Age'] ]
y = dat['Survived']

# 90% : ํ•™์Šต์šฉ, 10% : ํ…Œ์ŠคํŠธ์šฉ 
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.1,
                                                    random_state=0)

X_train.shape, X_test.shape, y_train.shape, y_test.shape

๊ฒฐ๊ณผ)

((801, 4), (90, 4), (801,), (90,))
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=2)
model.fit(X_train, y_train)
### ์˜ˆ์ธก์‹œํ‚ค๊ธฐ
pred = model.predict(X_test)
print("ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ ์ •ํ™•๋„ : {:.2f}".format(np.mean(pred == y_test)))

๊ฒฐ๊ณผ)

ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ ์ •ํ™•๋„ : 0.76

https://ldjwj.github.io/ML_Basic_Class/part03_ml/part03_ch02_01_knn_linear_ppt/ch02_knn_%ED%9A%8C%EA%B7%80_v115_202205.pdf

KNN ํšŒ๊ท€(K-์ตœ๊ทผ์ ‘ ์ด์›ƒ ํšŒ๊ท€) ๋ชจ๋ธ์€ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด K๊ฐœ์˜ ์ด์›ƒ ์ƒ˜ํ”Œ์„ ์ด์šฉํ•ด ์—ฐ์†์ ์ธ ๊ฐ’์„ ์˜ˆ์ธกํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. KNN ๋ถ„๋ฅ˜์™€ ์œ ์‚ฌํ•œ ์›๋ฆฌ๋กœ ์ž‘๋™ํ•˜์ง€๋งŒ, ์˜ˆ์ธกํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฐ’์ด ์ด์‚ฐ์ ์ธ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์—ฐ์†์ ์ผ ๋•Œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ฃผ์–ด์ง„ ์ž…๋ ฅ ์ƒ˜ํ”Œ์˜ K๊ฐœ์˜ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ด์›ƒ์˜ ํ‰๊ท ๊ฐ’์„ ๊ณ„์‚ฐํ•˜์—ฌ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ™€์ˆ˜๋กœ ์ •ํ•œ๋‹ค

3์ฐจ์›๋„ ๊ฐ€๋Šฅ 

tr_acc = []
test_acc = []
k_nums = range(1, 22, 2)# 1,3,5~21

for n in k_nums:
    # ๋ชจ๋ธ ์„ ํƒ ๋ฐ ํ•™์Šต 
    model = KNeighborsClassifier(n_neighbors=n)
    model.fit(X_train, y_train)
    
    # ์ •ํ™•๋„ ๊ตฌํ•˜๊ธฐ 
    acc_tr = model.score(X_train, y_train)
    acc_test = model.score(X_test, y_test)
    
    # ์ •ํ™•๋„ ๊ฐ’ ์ €์žฅ.
    tr_acc.append(acc_tr)
    test_acc.append(acc_test)
    
    print("k : ", n)
    print("ํ•™์Šต์šฉ์…‹ ์ •ํ™•๋„ {:.3f}".format(acc_tr) )
    print("ํ…Œ์ŠคํŠธ์šฉ์…‹ ์ •ํ™•๋„ {:.3f}".format(acc_test) )

๊ฒฐ๊ณผ) 

k :  1
ํ•™์Šต์šฉ์…‹ ์ •ํ™•๋„ 0.878
ํ…Œ์ŠคํŠธ์šฉ์…‹ ์ •ํ™•๋„ 0.733
k :  3
ํ•™์Šต์šฉ์…‹ ์ •ํ™•๋„ 0.850
ํ…Œ์ŠคํŠธ์šฉ์…‹ ์ •ํ™•๋„ 0.744
k :  5
ํ•™์Šต์šฉ์…‹ ์ •ํ™•๋„ 0.846
ํ…Œ์ŠคํŠธ์šฉ์…‹ ์ •ํ™•๋„ 0.756
k :  7
ํ•™์Šต์šฉ์…‹ ์ •ํ™•๋„ 0.821
ํ…Œ์ŠคํŠธ์šฉ์…‹ ์ •ํ™•๋„ 0.733
k :  9
ํ•™์Šต์šฉ์…‹ ์ •ํ™•๋„ 0.821
ํ…Œ์ŠคํŠธ์šฉ์…‹ ์ •ํ™•๋„ 0.722
k :  11
ํ•™์Šต์šฉ์…‹ ์ •ํ™•๋„ 0.831
ํ…Œ์ŠคํŠธ์šฉ์…‹ ์ •ํ™•๋„ 0.733
k :  13
ํ•™์Šต์šฉ์…‹ ์ •ํ™•๋„ 0.811
ํ…Œ์ŠคํŠธ์šฉ์…‹ ์ •ํ™•๋„ 0.767
k :  15
ํ•™์Šต์šฉ์…‹ ์ •ํ™•๋„ 0.803
ํ…Œ์ŠคํŠธ์šฉ์…‹ ์ •ํ™•๋„ 0.744
k :  17
ํ•™์Šต์šฉ์…‹ ์ •ํ™•๋„ 0.805
ํ…Œ์ŠคํŠธ์šฉ์…‹ ์ •ํ™•๋„ 0.767
k :  19
ํ•™์Šต์šฉ์…‹ ์ •ํ™•๋„ 0.785
ํ…Œ์ŠคํŠธ์šฉ์…‹ ์ •ํ™•๋„ 0.711
k :  21
ํ•™์Šต์šฉ์…‹ ์ •ํ™•๋„ 0.787
ํ…Œ์ŠคํŠธ์šฉ์…‹ ์ •ํ™•๋„ 0.744

https://colab.research.google.com/drive/1a4i2igOdXeMqh5U1aBbuI7_istHlzeX6#scrollTo=Y4jzbDfKJiR_

 

Google Colab Notebook

Run, share, and edit Python notebooks

colab.research.google.com


https://heytech.tistory.com/362

 

[Deep Learning] ํ‰๊ท ์ œ๊ณฑ์˜ค์ฐจ(MSE) ๊ฐœ๋… ๋ฐ ํŠน์ง•

๐Ÿ“Œ Text ๋น…๋ฐ์ดํ„ฐ๋ถ„์„ ํ”Œ๋žซํผ ๋ฒ ํƒ€ํ…Œ์ŠคํŠธ ์ฐธ๊ฐ€์ž ๋ชจ์ง‘ ์ค‘!(๋„ค์ด๋ฒ„ํŽ˜์ด 4๋งŒ ์› ์ „์› ์ง€๊ธ‰) ๐Ÿ‘‹ ์•ˆ๋…•ํ•˜์„ธ์š”, ์ฝ”๋”ฉ์ด ํ•„์š” ์—†๋Š” AI/๋น…๋ฐ์ดํ„ฐ ๋ถ„์„ All in One ํ”Œ๋žซํผ  ๊ฐœ๋ฐœํŒ€์ž…๋‹ˆ๋‹ค.๐Ÿ˜Š ์ €ํฌ 

heytech.tistory.com

!pip install graphviz
!pip install mglearn  # Install the mglearn library
import matplotlib.pyplot as plt
import mglearn
plt.figure(figsize=(10,10))
mglearn.plots.plot_animal_tree()

https://ldjwj.github.io/ML_Basic_Class/part03_ml/part03_ch02_02_decisiontree/ch02_03_01_decisiontree_v11_2205.pdf

  • (๊ฐ€) ํŠธ๋ฆฌ์— ์‚ฌ์šฉ๋˜๋Š” ์„ธ ๊ฐœ์˜ feature๊ฐ€ ์žˆ์Œ.
    • 'Has feathers?'(๋‚ ๊ฐœ๊ฐ€ ์žˆ๋‚˜์š”?)
    • 'Can fly?'(๋‚ ์ˆ˜ ์žˆ๋‚˜์š”?)
    • 'Has fins?'(์ง€๋Š๋Ÿฌ๋ฏธ๊ฐ€ ์žˆ๋‚˜์š”?)
  • (๋‚˜) ์ด ๋จธ์‹ ๋Ÿฌ๋‹ ๋ฌธ์ œ๋Š” ๋„ค ๊ฐœ์˜ ํด๋ž˜์Šค๋กœ ๊ตฌ๋ถ„ํ•˜๋Š” ๋ชจ๋ธ์„ ์ƒ์„ฑ
    • ๋„ค ๊ฐœ์˜ ํด๋ž˜์Šค - ๋งค, ํŽญ๊ถŒ, ๋Œ๊ณ ๋ž˜, ๊ณฐ
  • (๋‹ค) ๋…ธ๋“œ ์ข…๋ฅ˜
    • ๋งจ ์œ„์˜ ๋…ธ๋“œ - Root Node(๋ฃจํŠธ ๋…ธ๋“œ)
    • ๋งจ ๋งˆ์ง€๋ง‰ ๋…ธ๋“œ - Leaf Node(๋ฆฌํ”„ ๋…ธ๋“œ)
    • target๊ฐ€ ํ•˜๋‚˜๋กœ๋งŒ ์ด๋ฃจ์–ด์ง„ Leaf Node(๋ฆฌํ”„ ๋…ธ๋“œ) ์ˆœ์ˆ˜ ๋…ธ๋“œ (pure node)
    • ์ž์‹ ๋…ธ๋“œ๊ฐ€ ์—†๋Š” ์ตœํ•˜์œ„์˜ ๋…ธ๋“œ๋ฅผ ๋ฆฌํ”„ ๋…ธ๋“œ(Leaf node : ์žŽ)๋ผ๊ณ  ํ•˜๊ณ , ๋ฆฌํ”„ ๋…ธ๋“œ๊ฐ€ ์•„๋‹Œ ์ž์‹ ๋…ธ๋“œ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋…ธ๋“œ๋ฅผ ๋‚ด๋ถ€ ๋…ธ๋“œ(Internal node)๋ผ๊ณ  ํ•œ๋‹ค. ๋ฆฌํ”„ ๋…ธ๋“œ ์ค‘์—์„œ๋„ ํ•˜๋‚˜์˜ ํƒ€๊ฒŸ ๊ฐ’๋งŒ์„ ๊ฐ€์ง€๋Š” ๋…ธ๋“œ๋ฅผ ์ˆœ์ˆ˜ ๋…ธ๋“œ(Pure node)๋ผ๊ณ  ํ•œ๋‹ค.
    • ๋ฆฌํ”„ ๋…ธ๋“œ(Leaf Node)๋Š” ๊ฒฐ์ • ํŠธ๋ฆฌ์—์„œ ์ตœํ•˜์œ„ ๋…ธ๋“œ๋ฅผ ์˜๋ฏธํ•˜๋ฉฐ, ์ค‘๋‹จ ์—†์ด ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„๋ฅ˜ํ•œ ์ตœ์ข… ๊ฒฐ๊ณผ๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์ˆœ์ˆ˜ ๋…ธ๋“œ(Pure Node)๋Š” ๋ฆฌํ”„ ๋…ธ๋“œ ์ค‘์—์„œ ํ•˜๋‚˜์˜ ํด๋ž˜์Šค(ํƒ€๊นƒ ๊ฐ’)๋งŒ์„ ๊ฐ€์ง€๋Š” ๋…ธ๋“œ๋ฅผ ์ง€์นญํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ์ˆœ์ˆ˜ ๋…ธ๋“œ๋Š” ํ•ด๋‹น ๋…ธ๋“œ์— ํฌํ•จ๋œ ๋ฐ์ดํ„ฐ๋“ค์ด ๋ชจ๋‘ ๋™์ผํ•œ ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜๋˜๋Š” ๊ฒฝ์šฐ์ž…๋‹ˆ๋‹ค.
  • (๋ผ) ๋…ธ๋“œ ๋ถ„๊ธฐ(๊ฐ ๋…ธ๋“œ)
    • ๋ฒ”์ฃผํ˜•์€ ๋ฐ์ดํ„ฐ๋ฅผ ๊ตฌ๋ถ„ํ•˜๋Š” ์งˆ๋ฌธ ์„ ํ†ตํ•ด ๋‚˜๋ˆˆ๋‹ค.
    • ์—ฐ์†ํ˜•์€ ํŠน์„ฑ i๊ฐ€ a๋ณด๋‹ค ํฐ๊ฐ€?์˜ ์งˆ๋ฌธ์œผ๋กœ ๋‚˜๋ˆˆ๋‹ค.

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
import seaborn as sns
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target

X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    stratify=cancer.target, 
                                                    test_size = 0.3,
                                                    random_state=77)
tree = DecisionTreeClassifier(max_depth=2, random_state=0)
tree.fit(X_train, y_train)

print("ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„ : {:.3f}".format(tree.score(X_train, y_train)))
print("ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„ : {:.3f}".format(tree.score(X_test, y_test)))

๊ฒฐ๊ณผ)

ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„ : 0.972
ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„ : 0.912
for i in range(1,7,1):
  tree = DecisionTreeClassifier(max_depth=i, random_state=0)
  tree.fit(X_train, y_train)
  print(f"max_depth : {i}")
  print("ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„ : {:.3f}".format(tree.score(X_train, y_train)))
  print("ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„ : {:.3f}".format(tree.score(X_test, y_test)))

๊ฒฐ๊ณผ) 

max_depth : 1
ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„ : 0.932
ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„ : 0.883
max_depth : 2
ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„ : 0.972
ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„ : 0.912
max_depth : 3
ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„ : 0.982
ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„ : 0.906
max_depth : 4
ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„ : 0.985
ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„ : 0.906
max_depth : 5
ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„ : 0.992
ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„ : 0.889
max_depth : 6
ํ›ˆ๋ จ ์„ธํŠธ ์ •ํ™•๋„ : 0.997
ํ…Œ์ŠคํŠธ ์„ธํŠธ ์ •ํ™•๋„ : 0.901
from sklearn.tree import export_graphviz
import graphviz
export_graphviz(tree, 
                out_file="tree.dot", 
                class_names=['์•…์„ฑ', '์–‘์„ฑ'],
                feature_names = cancer.feature_names, 
                impurity = False,  # gini ๊ณ„์ˆ˜
                filled=True)       # color
with open("tree.dot") as f:
  dot_graph = f.read()
display(graphviz.Source(dot_graph))

๊ฒฐ๊ณผ) 


import torch  

# PyTorch ๋ฒ„์ „ ํ™•์ธ  
print(torch.__version__)  

# CUDA ์‚ฌ์šฉ ๊ฐ€๋Šฅ ์—ฌ๋ถ€ ํ™•์ธ  
print(torch.cuda.is_available())  

# ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ GPU ์žฅ์น˜ ์ˆ˜ ํ™•์ธ  
print(torch.cuda.device_count())

๊ฒฐ๊ณผ) 

2.5.0+cu121
True
1

https://ldjwj.github.io/DL_Basic/part04_01_dl_start/ch01_03_NNet_Titanic_pytorch_V01_2411.html

 

ch01_03_NNet_Titanic_pytorch_V01_2411

 

ldjwj.github.io

import numpy as np  
import pandas as pd  
import torch  
import torch.nn as nn  
import torch.optim as optim  
from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler  
from sklearn.impute import SimpleImputer
import pandas as pd # import pandas and assign it to the alias pd # NumPy ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ž„ํฌํŠธํ•˜๊ณ  np๋กœ ๋ณ„์นญ์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.

# ์‹œ๋“œ ๊ณ ์ •  
torch.manual_seed(42)  
np.random.seed(42)  

# 1. ๋ฐ์ดํ„ฐ ์ค€๋น„  
# ๋ฐ์ดํ„ฐ ๋กœ๋“œ  
data = pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv')  
data.shape

๊ฒฐ๊ณผ)

(891, 12)
# ํ•„์š”ํ•œ ํ”ผ์ฒ˜ ์„ ํƒ  
features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare']  
target = 'Survived'  

# ์„ฑ๋ณ„ ์ธ์ฝ”๋”ฉ  
data['Sex'] = data['Sex'].map({'male': 0, 'female': 1}) 

import pandas as pd
from sklearn.impute import SimpleImputer # Import SimpleImputer from sklearn.impute

# ๊ฒฐ์ธก๊ฐ’ ์ฒ˜๋ฆฌ  
imputer = SimpleImputer(strategy='median')  
X = imputer.fit_transform(data[features])  
y = data[target].values
# ๊ฒฐ์ธก๊ฐ’์ด ์ฒ˜๋ฆฌ๋˜์—ˆ๋Š”์ง€ ํ™•์ธ  
print("๊ฒฐ์ธก๊ฐ’ ์ฒ˜๋ฆฌ ํ›„ ๋ฐ์ดํ„ฐ:")  
print(pd.DataFrame(X, columns=features).isnull().sum())

๊ฒฐ๊ณผ) 

๊ฒฐ์ธก๊ฐ’ ์ฒ˜๋ฆฌ ํ›„ ๋ฐ์ดํ„ฐ:
Pclass    0
Sex       0
Age       0
SibSp     0
Parch     0
Fare      0
dtype: int64
# Import StandardScaler from sklearn.preprocessing
# from sklearn.preprocessing import Standard
from sklearn.preprocessing import StandardScaler # Changed 'Standard' to 'StandardScaler'

# ์Šค์ผ€์ผ๋ง  
scaler = StandardScaler()  
X = scaler.fit_transform(X)  

# ๋ฐ์ดํ„ฐ ๋ถ„ํ•   
X_train, X_test, y_train, y_test = train_test_split(  
    X, y, test_size=0.2, random_state=42  
)  

# NumPy to PyTorch Tensor ๋ณ€ํ™˜  
# X_train์ด๋ผ๋Š” NumPy ๋ฐฐ์—ด์„ PyTorch์˜ FloatTensor๋กœ ๋ณ€ํ™˜. 
# FloatTensor๋Š” 32๋น„ํŠธ ๋ถ€๋™ ์†Œ์ˆ˜์  ์ˆซ์ž๋กœ ๊ตฌ์„ฑ๋œ ํ…์„œ๋ฅผ ์ƒ์„ฑ.
# ์ด ๋ณ€ํ™˜์€ PyTorch ๋ชจ๋ธ์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋„๋ก ์ค€๋น„ํ•˜๋Š” ๋‹จ๊ณ„
X_train = torch.FloatTensor(X_train)  
X_test = torch.FloatTensor(X_test)  
y_train = torch.FloatTensor(y_train).unsqueeze(1)  
y_test = torch.FloatTensor(y_test).unsqueeze(1)
# 2. ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ ์ •์˜  
model = nn.Sequential(  
    nn.Linear(6, 16),  
    nn.ReLU(),  
    nn.Dropout(0.3),  
    nn.Linear(16, 8),  
    nn.ReLU(),  
    nn.Dropout(0.3),  
    nn.Linear(8, 1),  
    nn.Sigmoid()  
)
# 3. ๋ชจ๋ธ ํ•™์Šต  
# ์†์‹ค ํ•จ์ˆ˜์™€ ์˜ตํ‹ฐ๋งˆ์ด์ € ์ •์˜  
criterion = nn.BCELoss()  # ์ด์ง„ ๋ถ„๋ฅ˜ ์†์‹ค ํ•จ์ˆ˜  
optimizer = optim.Adam(model.parameters(), lr=0.001)
# ํ•™์Šต ์ง„ํ–‰  
epochs = 100  
for epoch in range(epochs):  
    # ์ˆœ์ „ํŒŒ  
    outputs = model(X_train)  
    loss = criterion(outputs, y_train)  
    
    # ์—ญ์ „ํŒŒ  
    optimizer.zero_grad()  
    loss.backward()  
    optimizer.step()  
    
    # 20๋ฒˆ๋งˆ๋‹ค ์†์‹ค ์ถœ๋ ฅ  
    if (epoch + 1) % 20 == 0:  
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

๊ฒฐ๊ณผ)

Epoch [20/100], Loss: 0.7141
Epoch [40/100], Loss: 0.7019
Epoch [60/100], Loss: 0.6749
Epoch [80/100], Loss: 0.6433
Epoch [100/100], Loss: 0.5905
# 4. ๋ชจ๋ธ ํ‰๊ฐ€  
model.eval()  # ํ‰๊ฐ€ ๋ชจ๋“œ  
with torch.no_grad():  
    test_outputs = model(X_test)  
    predicted = (test_outputs > 0.5).float()  
    accuracy = (predicted == y_test).float().mean()  
    print(f'Test Accuracy: {accuracy.item():.4f}')

๊ฒฐ๊ณผ)

Test Accuracy: 0.7877

https://snaiws.notion.site/DL-with-logit-a857e414f2ed4dfab2c51ad738388d84

 

DL with logit | Notion

ํ•™์Šต(learning)

snaiws.notion.site


728x90
๋ฐ˜์‘ํ˜•
Comments