如何在Pandas身上堆叠此特定行?

2024-04-29 19:15:27 发布

您现在位置:Python中文网/ 问答频道 /正文

考虑下面的DF

df_dict = {'name': {0: '  john',
  1: '  john',
  4: ' daphne '},
 'address': {0: 'johns address',
  1: 'johns address',
  4: 'daphne address'},
 'phonenum1': {0: 7870395,
  1: 7870450,
  4: 7373209},
 'phonenum2': {0: None, 1: 123450 , 4: None},
 'phonenum3': {0: None, 1: 123456, 4: None}
}

df = pd.DataFrame(df_dict)

    name    address       phonenum1     phonenum2   phonenum3
0   john    johns address   7870395     NaN         NaN
1   john    johns address   7870450     123450.0    123456.0
4   daphne  daphne address  7373209     NaN         NAN

如何取消phonenum数据的堆栈,以便在找到相同全名和地址的条目时,输出如下所示


    name     address       phonenum1     phonenum2   phonenum3    phonenum4
0   john    johns address   7870395      7870450     123450.0     123456.0
4   daphne  daphne address  7373209        NaN        NaN           NaN


Tags: namenonedataframedfaddressnanjohndict
2条回答

您可以使用set_indexstack,然后使用每个名称和地址groupby.cumcount来获取后面的列名,然后使用unstackreset_indexrename_axis来进行化妆

df_ = (df.set_index(['name', 'address'])
         .stack()
         .reset_index(level=-1)
         .assign(cc=lambda x: x.groupby(level=['name', 'address']).cumcount()+1)
         .set_index('cc', append=True)
         [0].unstack()
         .add_prefix('phonenum')
         .reset_index()
         .rename_axis(columns=None)
      )
print (df_)
       name         address  phonenum1  phonenum2  phonenum3  phonenum4
0      john   johns address  7870395.0  7870450.0   123450.0   123456.0
1   daphne   daphne address  7373209.0        NaN        NaN        NaN

按照代码的方式,您可以在关闭括号之前从第二行注释到最后一行,然后逐个取消注释每一行,以查看每次都发生了什么

我相信下面的代码将完成您试图完成的任务。它应该能够处理4个以上的电话号码以防万一

df = df.astype(str)
df['joined'] = df[['phonenum1','phonenum2','phonenum3']].agg(','.join,axis=1)
df['joined'] = df['joined'].str.replace(',nan','')
df['joined'] = df.groupby(['name','address'])['joined'].transform(lambda x: ','.join(x))
df = df.drop_duplicates(subset=['joined'])
columns = ['phonenum'+str(num+1) for num in range(df['joined'].str.count(',').max()+1)]
split = df['joined'].str.split(',',expand=True)
split.columns = columns
df = df[['name','address']]
pd.concat([df,split],axis=1)

相关问题 更多 >