如何创建新列来存储重复ID列的数据？

ID key key2 key3 0 1 A B NaN 1 2 C NaN NaN 2 3 D E E # The ID#3 has repeated three times. The key of # of the second repeat "E" will be stored under the "key2" column # and the third repeat "E" will be stored in the new column "key3"

2条回答

网友

1楼 · 编辑于 2024-04-25 06:26:01

可以将^{}与^{}一起使用：

df['cols'] = 'key' + df.groupby('ID').cumcount().astype(str)
print (df.pivot_table(index='ID', columns='cols', values='key', aggfunc=''.join))
cols key0  key1  key2
ID                   
1       A     B  None
2       C  None  None
3       D     E     E

网友

2楼 · 编辑于 2024-04-25 06:26:01

查看groupby和apply。它们各自的文档是here和here。您可以unstack（docs）创建的多索引的额外级别。你知道吗

df.groupby('ID')['key'].apply(
    lambda s: pd.Series(s.values, index=['key_%s' % i for i in range(s.shape[0])])
).unstack(-1)

输出

   key_0 key_1 key_2
ID                  
1      A     B  None
2      C  None  None
3      D     E     E

如果希望ID作为列，可以调用此数据帧上的reset_index。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何创建新列来存储重复ID列的数据？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >