Pandas：如何在保持列成对的同时按列组展开

df_have=pd.DataFrame.from_dict({'ID': {0: '100', 1: '100', 2: '100', 3: '100', 4: '100', 5: '200', 6: '200', 7: '200', 8: '200', 9: '200'}, 'ID_RELATIVE': {0: '100', 1: '100', 2: '150', 3: '150', 4: '190', 5: '200', 6: '200', 7: '250', 8: '290', 9: '290'}, 'RELATIVE_ROLE': {0: 'self', 1: 'self', 2: 'father', 3: 'father', 4: 'mother', 5: 'self', 6: 'self', 7: 'father', 8: 'mother', 9: 'mother'}, 'PHONE': {0: '111111', 1: '222222', 2: '333333', 3: '444444', 4: '555555', 5: '123456', 6: '456789', 7: '987654', 8: '778899', 9: '909090'}})

df_want=pd.DataFrame.from_dict({'ID': {0: '100', 1: '200'}, 'ID_RELATIVE_1': {0: '100', 1: '200'}, 'RELATIVE_ROLE_1': {0: 'self', 1: 'self'}, 'PHONE_1_1': {0: '111111', 1: '123456'}, 'PHONE_1_2': {0: '222222', 1: '456789'}, 'ID_RELATIVE_2': {0: '150', 1: '250'}, 'RELATIVE_ROLE_2': {0: 'father', 1: 'father'}, 'PHONE_2_1': {0: '333333', 1: '987654'}, 'PHONE_2_2': {0: '444444', 1: 'nan'}, 'ID_RELATIVE_3': {0: '190', 1: '290'}, 'RELATIVE_ROLE_3': {0: 'mother', 1: 'mother'}, 'PHONE_3_1': {0: '555555', 1: '778899'}, 'PHONE_3_2': {0: 'nan', 1: '909090'}})

1条回答

网友

1楼 · 发布于 2024-04-20 11:12:47

解决方案主要有两个步骤-首先按所有列分组，不带PHONE对于成对，将列名称转换为有序的分类，以便正确排序，然后按ID分组：

c = ['ID','ID_RELATIVE','RELATIVE_ROLE']
df = df_have.set_index(c+ [df_have.groupby(c).cumcount().add(1)])['PHONE']
df = df.unstack().add_prefix('PHONE_').reset_index()

df = df.set_index(['ID', df.groupby('ID').cumcount().add(1)])

df.columns = pd.CategoricalIndex(df.columns, categories=df.columns.tolist(), ordered=True)

df = df.unstack().sort_index(axis=1, level=1)

df.columns = [f'{a}_{b}' for a, b in df.columns]    
df = df.reset_index()
print (df)
    ID ID_RELATIVE_1 RELATIVE_ROLE_1 PHONE_1_1 PHONE_2_1 ID_RELATIVE_2  \
0  100           100            self    111111    222222           150   
1  200           200            self    123456    456789           250   

  RELATIVE_ROLE_2 PHONE_1_2 PHONE_2_2 ID_RELATIVE_3 RELATIVE_ROLE_3 PHONE_1_3  \
0          father    333333    444444           190          mother    555555   
1          father    987654       NaN           290          mother    778899   

  PHONE_2_3  
0       NaN  
1    909090

如果需要更改PHONE列中的数字顺序：

df.columns = [f'{a.split("_")[0]}_{b}_{a.split("_")[1]}' 
                 if 'PHONE' in a 
                 else f'{a}_{b}' for a, b in df.columns]    
df = df.reset_index()
print (df)
    ID ID_RELATIVE_1 RELATIVE_ROLE_1 PHONE_1_1 PHONE_1_2 ID_RELATIVE_2  \
0  100           100            self    111111    222222           150   
1  200           200            self    123456    456789           250   

  RELATIVE_ROLE_2 PHONE_2_1 PHONE_2_2 ID_RELATIVE_3 RELATIVE_ROLE_3 PHONE_3_1  \
0          father    333333    444444           190          mother    555555   
1          father    987654       NaN           290          mother    778899   

  PHONE_3_2  
0       NaN  
1    909090

相关问题更多 >

编程相关推荐

热门问题

热门文章