根据索引对数据帧行进行分组

industry index entities cars 0 ['Norway', 'it'] cars 0 ['Mercedes', 'they'] cars 0 ['it', 'EV', 'its'] nature 1 ['fox', 'it'] nature 1 ['them', 'rabbits'] nature 2 ['whale', 'it']

industry index entities cars 0 [ ['Norway', 'it'], ['Mercedes', 'they'], ['it', 'EV', 'its'] ] nature 1 [ ['fox', 'it'], ['them', 'rabbits'] ] nature 2 ['whale', 'it']

2条回答

网友

1楼 · 编辑于 2024-05-19 17:03:37

您需要在groupby之后将index更改为entities，以处理列entities，并在groupby语句中按list-['industry'，'index']进行分组：

df = df.groupby(['industry', 'index'])['entities'].apply(list).reset_index()
print (df)
  industry  index                                         entities
0     cars      0  [[Norway, it], [Mercedes, they], [it, EV, its]]
1   nature      1                     [[fox, it], [them, rabbits]]
2   nature      2                                    [[whale, it]]

如果需要不在嵌套列表中的最后一个值，因为通过if-else使用lambda函数，每个组只能有一个值：

df1 = (df.groupby(['industry', 'index'])['entities']
         .apply(lambda x: x.tolist() if len(x) != 1 else x.iat[0])
         .reset_index())
print (df1)
  industry  index                                         entities
0     cars      0  [[Norway, it], [Mercedes, they], [it, EV, its]]
1   nature      1                     [[fox, it], [them, rabbits]]
2   nature      2                                      [whale, it]

编辑：

如果列entities中只有列表的字符串表示，则可以在上述解决方案之前通过ast模块将值转换为列表：

print (type(df['entities'].iat[0]))
<class 'str'>

import ast
df['entities'] = df['entities'].apply(ast.literal_eval)

print (type(df['entities'].iat[0]))
<class 'list'>

网友

2楼 · 编辑于 2024-05-19 17:03:37

假设entities中的元素是list：

df.groupby(['industry', 'index'])['entities'].apply(lambda x: [l for l in x]).reset_index()

输出：

  industry  index                                         entities
0     cars      0  [[Norway, it], [Mercedes, they], [it, EV, its]]
1   nature      1                     [[fox, it], [them, rabbits]]
2   nature      2                                    [[whale, it]]

相关问题更多 >

编程相关推荐

热门问题

热门文章