๋ฐฐ์ด ์กฐ์ธ
- ๋ ๊ฐ ์ด์์ ๋ฐฐ์ด๋ด์ฉ์ ๋จ์ผ ๋ฐฐ์ด์ ๋ฃ๋ ๊ฒ์ ์๋ฏธ Numpy์์๋ ์ถ์ ๊ธฐ์ค์ผ๋ก ๋ฐฐ์ด์ ์กฐ์ธํ๋ค
stack() : ์ฐ๊ฒฐ๊ณผ ๋์ผํ๊ณ ์ฐจ์ด์ ์ ์คํํน์ด ์์ถ์ ๋ฐ๋ผ ์ํ๋๋ค๋ ๊ฒ์ด๋ค ๋ช
์์ ์ผ๋ก ์ ๋ฌ๋์ง ์์ผ๋ฉด 0์ผ๋ก ๊ฐ์ฃผ๋๋ค
2) hstack() : ํ์ ๋ฐ๋ผ ์๋ ๋์ฐ๋ฏธ
3) vstack() : ์ด์ ๋ฐ๋ผ ์๋ ๋์ฐ๋ฏธ ๊ธฐ๋ฅ์ ์ ๊ณต
4) dstack() : ๊น์ด์ ๋์ผํ ๋์ด๋ฅผ ๋ฐ๋ผ ์๋ ๋์ฐ๋ฏธ ๊ธฐ๋ฅ์ ์ ๊ณต
stack : ์๋ก์ด ๋ฒํฐ์ปฌ ์ถ์ผ๋ก ์ํ๋๋ค
ver1 = np.array([1,2,3])
ver2 = np.array([4,5,6])
verti = np.stack((ver1, ver2), axis=1)
print(verti)
hstack() : ํ์ ๋ฐ๋ผ ์๊ธฐ
vstack() : ํจ์๋ ์ฃผ์ด์ง ๋ฐฐ์ด์ ์์ง์ผ๋ก ์์(concatenate) ํ๋์ ๋ฐฐ์ด๋ก ๋ง๋ค์ด ์ค๋ค
dstack() : ๋์ด(๊น์ด)๋ฅผ ๋ฐ๋ผ ์๊ธฐ stack๊ณผ ๋น์ทํ๋ค
concatenating
# ์ปฌ๋ผ๋ช ์ด ๊ฐ์ผ๋ฉด, ์ฌ๋ฌ ๋ฐ์ดํฐ ํ๋ ์์ ํ๋๋ก ํฉ์น ์ ์๋ค 2๊ฐ ์ด์์ ๋ฐ์ดํฐ๋ ๊ฐ๋ฅํ๋ค
df1
A | B | C | D | |
0 | A0 | B0 | C0 | D0 |
1 | A1 | B1 | C1 | D1 |
2 | A2 | B2 | C2 | D2 |
3 | A3 | B3 | C3 | D3 |
df2
A | B | C | D | |
4 | A4 | B4 | C4 | D4 |
5 | A5 | B5 | C5 | D5 |
6 | A6 | B6 | C6 | D6 |
7 | A7 | B7 | C7 | D7 |
df3
A | B | C | D | |
8 | A8 | B8 | C8 | D8 |
9 | A9 | B9 | C9 | D9 |
10 | A10 | B10 | C10 | D10 |
11 | A11 | B11 | C11 | D11 |
pd.concat( [df1, df2, df3] ) # concatenating
A | B | C | D | |
0 | A0 | B0 | C0 | D0 |
1 | A1 | B1 | C1 | D1 |
2 | A2 | B2 | C2 | D2 |
3 | A3 | B3 | C3 | D3 |
4 | A4 | B4 | C4 | D4 |
5 | A5 | B5 | C5 | D5 |
6 | A6 | B6 | C6 | D6 |
7 | A7 | B7 | C7 | D7 |
8 | A8 | B8 | C8 | D8 |
9 | A9 | B9 | C9 | D9 |
10 | A10 | B10 | C10 | D10 |
11 | A11 | B11 | C11 | D11 |
merge
# ์ปฌ๋ผ๋ช ์ด ๋ค๋ฅผ ๊ฒฝ์ฐ์ ์ฌ์ฉ๊ฐ๋ฅํ๋ค
# merge ํจ์๋ 2๊ฐ์ ๋ฐ์ดํฐํ๋ ์๋ง ๊ฐ๋ฅํ๋ค
df_all
Employee ID | first name | last name | |
0 | 1 | Diana | Bouchard |
1 | 2 | Cynthia | Ali |
2 | 3 | Shep | Rob |
3 | 4 | Ryan | Mitch |
4 | 5 | Allen | Steve |
5 | 6 | Bill | Christian |
6 | 7 | Dina | Mo |
7 | 8 | Sarah | Steve |
8 | 9 | Heather | Bob |
9 | 10 | Holly | Michelle |
df_salary # Employee ID 6๋ฒ์ด ๋น ์ ธ์๋ค.
Employee ID | Salary [$/hour] | |
0 | 1 | 25.0 |
1 | 2 | 35.0 |
2 | 3 | 45.0 |
3 | 4 | 48.0 |
4 | 5 | 49.0 |
5 | 7 | 32.0 |
6 | 8 | 33.0 |
7 | 9 | 34.0 |
8 | 10 | 23.0 |
# how = 'left' ๋ฅผ ์ด์ฉํ์ฌ ๋ฐ์ดํฐํ๋ ์์ ์๋ ๋ฐ์ดํฐ๋ nan์ผ๋ก ํ์ํ๊ฒํ๊ธฐ
# on = '์ค๋ณต ๋ฐ์ดํฐ ์ปฌ๋ผ' ์ด๋ฆ์ผ๋ก ์ฐ๊ฒฐ ๊ฐ๋ฅํ๋ค
pd.merge( df_all, df_salary, on='Employee ID' , how = 'left' )
Employee ID | first name | last name | Salary [$/hour] | |
0 | 1 | Diana | Bouchard | 25.0 |
1 | 2 | Cynthia | Ali | 35.0 |
2 | 3 | Shep | Rob | 45.0 |
3 | 4 | Ryan | Mitch | 48.0 |
4 | 5 | Allen | Steve | 49.0 |
5 | 6 | Bill | Christian | NaN |
6 | 7 | Dina | Mo | 32.0 |
7 | 8 | Sarah | Steve | 33.0 |
8 | 9 | Heather | Bob | 34.0 |
9 | 10 | Holly | Michelle | 23.0 |
๋ฐฐ์ด ๋ถํ
- ๊ฒฐํฉ์ ์ฌ๋ฌ ๋ฐฐ์ด์ ํ๋๋ก ๋ณํฉํ๊ณ ๋ถํ ์ ํ๋์ ๋ฐฐ์ด์ ์ฌ๋ฌ ๋ฐฐ์ด๋ก ๋๋๋ค
- array_split()
spl = np.array([1, 2, 3, 4, 5, 6])
newspl = np.array_split(arr, 3) #๋ถํ ์ 3๊ฐ๋ก
print(newspl)
#2์ฐจ์ ๋ฐฐ์ด์ 3๊ฐ์ 2์ฐจ์ ๋ฐฐ์ด๋ก
three = np.array([[1,2],[3,4],[5,6],[7,8],[9,10],[11,12]])
newthree = np.array_split(three,3)
print(newthree)