带bin和聚合的Pandas Groupby

2024-06-17 12:15:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个类似的问题

我有一个熊猫的数据框,看起来像这样——显示不同用户获奖的年龄

^{tb1}$

有兴趣计算年龄段的总奖励,即0(0-8岁)、1(9-17岁)、2(18-26岁)、3(27-35岁)、4(26-44岁)。。。每个人都应该有尽可能多的年龄间隔

我如何按id和9岁年龄间隔对他们进行分组,以获得如下结果:

^{tb2}$

Tags: 数据用户id间隔兴趣年龄tb2tb1
2条回答

可以按如下方式定义料仓和切割:

bins = [9 * i for i in range(0, df['age'].max() // 9 + 2)]
cuts = pd.cut(df['age'], bins, right=False)

print(cuts)

0    [18, 27)
1    [18, 27)
2    [54, 63)
3    [27, 36)
4    [45, 54)
Name: age, dtype: category
Categories (7, interval[int64, left]): [[0, 9) < [9, 18) < [18, 27) < [27, 36) < [36, 45) < [45, 54) < [54, 63)]

然后,根据idcutsawards对切割进行分组,得到total_awards。通过^{}创建age_interval

df_out = (df.groupby(['id', cuts])
            .agg(total_awards=('awards', 'sum'))
            .reset_index(level=0)
            .reset_index(drop=True)
         )
df_out['age_interval'] = df_out.groupby('id').cumcount()

结果:

print(df_out)

    id  total_awards  age_interval
0    1             0             0
1    1             0             1
2    1           250             2
3    1             0             3
4    1             0             4
5    1             0             5
6    1            50             6
7    2             0             0
8    2             0             1
9    2             0             2
10   2           193             3
11   2             0             4
12   2           209             5
13   2             0             6

这肯定涵盖了你要找的东西

df = pd.read_clipboard()
bins = [i for i in range(0, 100 ,9)]
results = df.groupby(['id', pd.cut(df.age, bins)])['awards'].sum().reset_index()
print(results)
    id  age         awards
0   1   (0, 9]      NaN
1   1   (9, 18]     NaN
2   1   (18, 27]    250.0
3   1   (27, 36]    NaN
4   1   (36, 45]    NaN
5   1   (45, 54]    50.0
6   1   (54, 63]    NaN
7   1   (63, 72]    NaN
8   1   (72, 81]    NaN
9   1   (81, 90]    NaN
10  1   (90, 99]    NaN
11  2   (0, 9]      NaN
12  2   (9, 18]     NaN
13  2   (18, 27]    NaN
14  2   (27, 36]    193.0
15  2   (36, 45]    NaN
16  2   (45, 54]    209.0
17  2   (54, 63]    NaN
18  2   (63, 72]    NaN
19  2   (72, 81]    NaN
20  2   (81, 90]    NaN
21  2   (90, 99]    NaN

相关问题 更多 >