根据索引对数据帧行进行分组

2024-05-07 23:56:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个Pandas数据帧,我正在尝试根据列值对行进行分组,并将一些行合并到列表中。请允许我详细说明:

我拥有的数据帧如下所示:

industry     index     entities
cars         0         ['Norway', 'it']
cars         0         ['Mercedes', 'they']
cars         0         ['it', 'EV', 'its']
nature       1         ['fox', 'it']
nature       1         ['them', 'rabbits']
nature       2         ['whale', 'it']

所需的数据帧应如下所示:

industry     index     entities
cars         0         [ ['Norway', 'it'], ['Mercedes', 'they'], ['it', 'EV', 'its'] ]
nature       1         [ ['fox', 'it'], ['them', 'rabbits'] ]
nature       2         ['whale', 'it']

我基本上是根据行业和索引对行进行分组,同时将列entities的值合并到列表中。你知道吗

我做过这样的尝试

df.groupby('industry')['index'].apply(list)

但是他们给了我完全不同的结果。你知道吗

我怎样才能完成我想要的?非常感谢。你知道吗


Tags: 数据列表indexitcarsmercedesitsnature
2条回答

您需要在groupby之后将index更改为entities,以处理列entities,并在groupby语句中按list-['industry','index']进行分组:

df = df.groupby(['industry', 'index'])['entities'].apply(list).reset_index()
print (df)
  industry  index                                         entities
0     cars      0  [[Norway, it], [Mercedes, they], [it, EV, its]]
1   nature      1                     [[fox, it], [them, rabbits]]
2   nature      2                                    [[whale, it]]

如果需要不在嵌套列表中的最后一个值,因为通过if-else使用lambda函数,每个组只能有一个值:

df1 = (df.groupby(['industry', 'index'])['entities']
         .apply(lambda x: x.tolist() if len(x) != 1 else x.iat[0])
         .reset_index())
print (df1)
  industry  index                                         entities
0     cars      0  [[Norway, it], [Mercedes, they], [it, EV, its]]
1   nature      1                     [[fox, it], [them, rabbits]]
2   nature      2                                      [whale, it]

编辑:

如果列entities中只有列表的字符串表示,则可以在上述解决方案之前通过ast模块将值转换为列表:

print (type(df['entities'].iat[0]))
<class 'str'>

import ast
df['entities'] = df['entities'].apply(ast.literal_eval)

print (type(df['entities'].iat[0]))
<class 'list'>

假设entities中的元素是list

df.groupby(['industry', 'index'])['entities'].apply(lambda x: [l for l in x]).reset_index()

输出:

  industry  index                                         entities
0     cars      0  [[Norway, it], [Mercedes, they], [it, EV, its]]
1   nature      1                     [[fox, it], [them, rabbits]]
2   nature      2                                    [[whale, it]]

相关问题 更多 >