pandas 공부

pandas.factorize() 대부분의 머신러닝 알고리즘은 숫자형을 다루므로 카테고리를 텍스트에서 숫자로 바꿔줘야 한다. 이를 위해 각 카테고리를 다른 정숫값으로 매핑해주는 factorize() 함수를 사용합니다. pandas.factorize() and Series.factorize() Parameters: values : 1D sequence. sort : [bool, Default is False] Sort uniques and shuffle labels. na_sentinel : [ int, default -1] Missing Values to mark ‘not found’. Return: Numeric representation of array factorize() 방법의 작동 설명 import numpy as np import pandas as pd from pandas.

JOIN 함수 Key Value를 기준으로 데이터 프레임을 병합하는 함수 import pandas as pd d1 = {'Asset_Allocation':[1,2,3,4,5,6], 'stock':['IDEXX','Zoetis','Freshpet','Chewy','Trupanion','WOOF'] } df1 = pd.DataFrame(d1) d2 = {'Asset_Allocation':[2,3,6,8], 'Analyze':['Buy','Hold','Sell', 'None']} df2 = pd.DataFrame(d2) df1 .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } df2 .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } Inner 조인 수행. inner_join_result = pd.merge(df1, df2, on='Asset_Allocation', how='inner') inner_join_result .