按多列分组并保留多列

2024-06-16 12:31:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧

    action  person_id                       frame_no        path
0   boxing  person12_boxing_d2_uncomp.avi   image_0128.jpg  ../../../datasets/kth/train/boxing/person12_bo...
1   boxing  person12_boxing_d2_uncomp.avi   image_0129.jpg  ../../../datasets/kth/train/boxing/person12_bo...
2   walking person13_boxing_d2_uncomp.avi   image_0130.jpg  ../../../datasets/kth/train/walking/person13_b...
3   walking person13_boxing_d2_uncomp.avi   image_0131.jpg  ../../../datasets/kth/train/walking/person13_b...
4   running person13_boxing_d2_uncomp.avi   image_0132.jpg  ../../../datasets/kth/train/running/person13_b.

我正在尝试合并具有相同person_id的行。具有相同person_id的行肯定具有相同的action。这就是我现在拥有的

df = pd.DataFrame(data_filtered, columns=["action","person_id","frame_no","path"])
#df = pd.DataFrame(df.groupby(["action","person_id"])['frame_no'].apply(list)).reset_index()
df.head()

但是数据帧丢失了path列。我不知道如何告诉熊猫对剩下的栏目进行分组,在谷歌上搜索也没有帮助,因为我甚至不知道要搜索什么。如果有人反复问这个问题,我很抱歉

@Aditya

我试过了

df = pd.DataFrame(df.groupby(["action","person_id"])[['frame_no', 'path']].apply(list)).reset_index()

但这就是我得到的

    action  person_id                       0
0   boxing  person12_boxing_d2_uncomp.avi   [frame_no, path]
1   running person13_boxing_d2_uncomp.avi   [frame_no, path]
2   walking person13_boxing_d2_uncomp.avi   [frame_no, path]

Tags: pathnoimageiddfactionframeperson
2条回答

仅将^{}更改为^{}以将每个列转换为列表:

print (df)
    action                      person_id        frame_no         path
0   boxing  person12_boxing_d2_uncomp.avi  image_0128.jpg  person12_bo
1   boxing  person12_boxing_d2_uncomp.avi  image_0129.jpg  person12_bo
2  walking  person13_boxing_d2_uncomp.avi  image_0130.jpg   person13_b
3  walking  person13_boxing_d2_uncomp.avi  image_0131.jpg   person13_b
4  running  person13_boxing_d2_uncomp.avi  image_0132.jpg   person13_b

df = df.groupby(["action","person_id"])['frame_no', 'path'].agg(list)
print (df)
                                                               frame_no  \
action  person_id                                                         
boxing  person12_boxing_d2_uncomp.avi  [image_0128.jpg, image_0129.jpg]   
running person13_boxing_d2_uncomp.avi                  [image_0132.jpg]   
walking person13_boxing_d2_uncomp.avi  [image_0130.jpg, image_0131.jpg]   

                                                             path  
action  person_id                                                  
boxing  person12_boxing_d2_uncomp.avi  [person12_bo, person12_bo]  
running person13_boxing_d2_uncomp.avi                [person13_b]  
walking person13_boxing_d2_uncomp.avi    [person13_b, person13_b]  
# pd.__version__ == 0.25.1
d=[['hello',1,'GOOD','long.kw'],
   ['chipotle',2,'GOOD','bingo'],
   ['hello',3,"BAD", "lm"]]
t=pd.DataFrame(data=d, columns=['A','B','C','D'])

输出为

t.groupby('A')[['B','C']].agg(lambda x: tuple(x)).applymap(list)
               B            C
A
chipotle     [2]       [GOOD]
hello     [1, 3]  [GOOD, BAD]

相关问题 更多 >