如何使用pandas根据原始数据为新添加的列赋值

2024-04-20 01:34:53 发布

您现在位置:Python中文网/ 问答频道 /正文

my picture

如图所示,同一“RID”中有不同的“VISCODE2”分类。“BSXSYMNO”编码对使用29个空列。我想用一条线表示对应于相同“VISCODE2”的“BSXSYMNO”。在每个空列中,使用的属性由1编码,其他属性由0编码。 我使用字典来存储编码的值

ph_dict = {1:'AXNAUSEA', 2:'AXVOMIT', 3:'AXDIARRH', 4:'AXCONSTP',5:'AXABDOMN', 6:'AXSWEATN', 7:'AXDIZZY',
               8:'AXENERGY', 9:'AXDROWSY',10:'AXVISION',11:'AXHDACHE', 12:'AXDRYMTH', 13:'AXBREATH', 14:'AXCOUGH',
               15:'AXPALPIT', 16:'AXCHEST',17:'AXURNDIS', 18:'AXURNFRQ',19:'AXANKLE', 20:'AXMUSCLE', 21:'AXRASH',
               22:'AXINSOMN',23:'AXDPMOOD', 24:'AXCRYING', 25:'AXELMOOD', 26:'AXWANDER', 27:'AXFALL',28:'AXOTHER',
               29:'AXSPECIF',}
for index, row in df2.iterrows():
    vis = row['VISCODE2']
    num = row['BSXSYMNO']
    if(0<num<=20):
        df2.loc[index,df2[ph_dict[num]]]=1

我想要的结果如下所示。 my picturre2


1条回答
网友
1楼 · 发布于 2024-04-20 01:34:53

使用:

ph_dict = {1:'AXNAUSEA', 2:'AXVOMIT', 3:'AXDIARRH', 4:'AXCONSTP',5:'AXABDOMN', 6:'AXSWEATN', 7:'AXDIZZY',
               8:'AXENERGY', 9:'AXDROWSY',10:'AXVISION',11:'AXHDACHE', 12:'AXDRYMTH', 13:'AXBREATH', 14:'AXCOUGH',
               15:'AXPALPIT', 16:'AXCHEST',17:'AXURNDIS', 18:'AXURNFRQ',19:'AXANKLE', 20:'AXMUSCLE', 21:'AXRASH',
               22:'AXINSOMN',23:'AXDPMOOD', 24:'AXCRYING', 25:'AXELMOOD', 26:'AXWANDER', 27:'AXFALL',28:'AXOTHER',
               29:'AXSPECIF'}

#need columns which are not filledn NaNs
need = ['Phase','RID','VISCODE2','BSXSYMNO']
df = pd.read_csv('RECDE.csv')
print (df.head(3))
   Phase  RID VISCODE2  BSXSYMNO  AXNAUSEA  AXVOMIT  AXDIARRH  AXCONSTP  \
0  ADNI1    2       bl      14.0       NaN      NaN       NaN       NaN   
1  ADNI1    2       bl      18.0       NaN      NaN       NaN       NaN   
2  ADNI1    2       bl      20.0       NaN      NaN       NaN       NaN   

   AXABDOMN  AXSWEATN  ...  AXMUSCLE  AXRASH  AXINSOMN  AXDPMOOD  AXCRYING  \
0       NaN       NaN  ...       NaN     NaN       NaN       NaN       NaN   
1       NaN       NaN  ...       NaN     NaN       NaN       NaN       NaN   
2       NaN       NaN  ...       NaN     NaN       NaN       NaN       NaN   

   AXELMOOD  AXWANDER  AXFALL  AXOTHER  AXSPECIF  
0       NaN       NaN     NaN      NaN       NaN  
1       NaN       NaN     NaN      NaN       NaN  
2       NaN       NaN     NaN      NaN       NaN  

[3 rows x 33 columns]

#filled greater 0 and less or equal 20
mask = df['BSXSYMNO'].gt(0) & df['BSXSYMNO'].le(20)

#created indicator only for filtered BSXSYMNO column mapped by dict
df1 = pd.get_dummies(df.loc[mask, 'BSXSYMNO'].map(ph_dict))
#repeated not matched rows
df2 = df.loc[~mask, ['BSXSYMNO']].reindex(['BSXSYMNO'] + df1.columns.tolist(), axis=1).ffill(axis=1).drop('BSXSYMNO', axis=1)
#joined together and added original columns
df = pd.concat([df1, df2], sort=True).sort_index().join(df[need]).reindex(df.columns, axis=1, fill_value=0)

print (df.head(3))
   Phase  RID VISCODE2  BSXSYMNO  AXNAUSEA  AXVOMIT  AXDIARRH  AXCONSTP  \
0  ADNI1    2       bl      14.0       0.0      0.0       0.0       0.0   
1  ADNI1    2       bl      18.0       0.0      0.0       0.0       0.0   
2  ADNI1    2       bl      20.0       0.0      0.0       0.0       0.0   

   AXABDOMN  AXSWEATN  ...  AXMUSCLE  AXRASH  AXINSOMN  AXDPMOOD  AXCRYING  \
0       0.0       0.0  ...       0.0       0         0         0         0   
1       0.0       0.0  ...       0.0       0         0         0         0   
2       0.0       0.0  ...       1.0       0         0         0         0   

   AXELMOOD  AXWANDER  AXFALL  AXOTHER  AXSPECIF  
0         0         0       0        0         0  
1         0         0       0        0         0  
2         0         0       0        0         0  

[3 rows x 33 columns]

相关问题 更多 >