Pandas:如何在保持列成对的同时按列组展开

2024-04-20 11:12:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要取消一个联系人列表(身份证,亲属,电话号码…),以便列保持一个特定的顺序。你知道吗

给定一个索引,dataframe UNSTACK通过逐个取消单列的堆栈操作,即使应用于两列

数据有

df_have=pd.DataFrame.from_dict({'ID': {0: '100',   1: '100',  2: '100',  3: '100',  4: '100',  5: '200',  6: '200',  7: '200',  8: '200',  9: '200'},
 'ID_RELATIVE': {0: '100',  1: '100',  2: '150',  3: '150',  4: '190',  5: '200',  6: '200',  7: '250',  8: '290',  9: '290'},
 'RELATIVE_ROLE': {0: 'self',  1: 'self',  2: 'father',  3: 'father',  4: 'mother',  5: 'self',  6: 'self',  7: 'father',  8: 'mother',  9: 'mother'},
 'PHONE': {0: '111111',  1: '222222',  2: '333333',  3: '444444',  4: '555555',   5: '123456',  6: '456789',  7: '987654',  8: '778899',  9: '909090'}})

需要数据

df_want=pd.DataFrame.from_dict({'ID': {0: '100', 1: '200'},
 'ID_RELATIVE_1': {0: '100', 1: '200'},
 'RELATIVE_ROLE_1': {0: 'self', 1: 'self'},
 'PHONE_1_1': {0: '111111', 1: '123456'},
 'PHONE_1_2': {0: '222222', 1: '456789'},
 'ID_RELATIVE_2': {0: '150', 1: '250'},
 'RELATIVE_ROLE_2': {0: 'father', 1: 'father'},
 'PHONE_2_1': {0: '333333', 1: '987654'},
 'PHONE_2_2': {0: '444444', 1: 'nan'},
 'ID_RELATIVE_3': {0: '190', 1: '290'},
 'RELATIVE_ROLE_3': {0: 'mother', 1: 'mother'},
 'PHONE_3_1': {0: '555555', 1: '778899'},
 'PHONE_3_2': {0: 'nan', 1: '909090'}})

因此,最后,我需要ID作为索引,并取消其他列的堆栈,这些列将因此成为ID的属性

通常的拆垛过程提供“正确”的输出,但形状错误。你知道吗

df2=have.groupby(['ID'])['ID_RELATIVE','RELATIVE_ROLE','PHONE'].apply(lambda x: x.reset_index(drop=True)).unstack()

这将需要对列重新排序,并删除一些重复项(按列,而不是按行),以及FOR循环。我希望避免使用这种方法,因为我正在寻找一种更“优雅”的方法,通过分组/堆叠/拆垛/旋转等方式来实现所需的结果。你知道吗

多谢了


Tags: 数据fromselfiddataframedf堆栈have
1条回答
网友
1楼 · 发布于 2024-04-20 11:12:47

解决方案主要有两个步骤-首先按所有列分组,不带PHONE对于成对,将列名称转换为有序的分类,以便正确排序,然后按ID分组:

c = ['ID','ID_RELATIVE','RELATIVE_ROLE']
df = df_have.set_index(c+ [df_have.groupby(c).cumcount().add(1)])['PHONE']
df = df.unstack().add_prefix('PHONE_').reset_index()

df = df.set_index(['ID', df.groupby('ID').cumcount().add(1)])

df.columns = pd.CategoricalIndex(df.columns, categories=df.columns.tolist(), ordered=True)

df = df.unstack().sort_index(axis=1, level=1)

df.columns = [f'{a}_{b}' for a, b in df.columns]    
df = df.reset_index()
print (df)
    ID ID_RELATIVE_1 RELATIVE_ROLE_1 PHONE_1_1 PHONE_2_1 ID_RELATIVE_2  \
0  100           100            self    111111    222222           150   
1  200           200            self    123456    456789           250   

  RELATIVE_ROLE_2 PHONE_1_2 PHONE_2_2 ID_RELATIVE_3 RELATIVE_ROLE_3 PHONE_1_3  \
0          father    333333    444444           190          mother    555555   
1          father    987654       NaN           290          mother    778899   

  PHONE_2_3  
0       NaN  
1    909090  

如果需要更改PHONE列中的数字顺序:

df.columns = [f'{a.split("_")[0]}_{b}_{a.split("_")[1]}' 
                 if 'PHONE' in a 
                 else f'{a}_{b}' for a, b in df.columns]    
df = df.reset_index()
print (df)
    ID ID_RELATIVE_1 RELATIVE_ROLE_1 PHONE_1_1 PHONE_1_2 ID_RELATIVE_2  \
0  100           100            self    111111    222222           150   
1  200           200            self    123456    456789           250   

  RELATIVE_ROLE_2 PHONE_2_1 PHONE_2_2 ID_RELATIVE_3 RELATIVE_ROLE_3 PHONE_3_1  \
0          father    333333    444444           190          mother    555555   
1          father    987654       NaN           290          mother    778899   

  PHONE_3_2  
0       NaN  
1    909090  

相关问题 更多 >