pandas groupby过滤器，删除一些组

2条回答

网友

1楼 · 编辑于 2024-04-20 10:09:33

假设最后一行代码应该是>5，而不是>20，那么您可以做类似的事情：

grouped.filter(lambda x: (x.time > 5).any())

正如您正确地发现的那样，x实际上是一个DataFrame，其中name列与for循环中k中的键相匹配。在

因此，您需要根据time列中是否有大于5的倍数进行筛选，您可以执行上面的(x.time > 5).any()来测试它。在

网友

2楼 · 编辑于 2024-04-20 10:09:33

我还不习惯Python，纽比和熊猫。但我正在研究一个类似问题的解决方案，所以让我以这个问题为例来报告我的答案。在

import pandas as pd

df = pd.DataFrame()
df['name'] = ['foo', 'foo', 'bar', 'bar', 'foobar', 'foobar']
df['time'] = [5, 2, 5, 6, 20, 1]

grouped = df.groupby('name')
for k, group in grouped:
    print(group)

我的回答1:

^{pr2}$

我的回答2:

filter_time_max = grouped['time'].max() > 5
groups_should_keep = filter_time_max.loc[filter_time_max].index
result2 = df.loc[df['name'].isin(groups_should_keep)]

我的回答3:

filter_time_max = grouped['time'].max() <= 5
groups_should_drop = filter_time_max.loc[filter_time_max].index
result3 = df.drop(df[df['name'].isin(groups_should_drop)].index)

结果

    name    time
2   bar     5
3   bar     6
4   foobar  20
5   foobar  1

点

我的答案1不使用组名来删除组。如果需要组名，可以通过写：df.loc[indexes_should_drop].name.unique()来获得。在

grouped['time'].max() <= 5和grouped.apply(lambda x: (x['time'].max() <= 5)).index返回相同的结果。在

filter_time_max的索引是一个组名。它不能用作索引或标签，不能按原样删除。在

name
foo        True
bar       False
foobar    False
Name: time, dtype: bool

我的回答1:

我的回答2:

我的回答3:

结果

点

相关问题更多 >

编程相关推荐

热门问题

热门文章