2024-06-17 12:15:56 发布
网友
我有一个类似的问题
我有一个熊猫的数据框,看起来像这样——显示不同用户获奖的年龄
有兴趣计算年龄段的总奖励,即0(0-8岁)、1(9-17岁)、2(18-26岁)、3(27-35岁)、4(26-44岁)。。。每个人都应该有尽可能多的年龄间隔
我如何按id和9岁年龄间隔对他们进行分组,以获得如下结果:
可以按如下方式定义料仓和切割:
bins = [9 * i for i in range(0, df['age'].max() // 9 + 2)] cuts = pd.cut(df['age'], bins, right=False) print(cuts) 0 [18, 27) 1 [18, 27) 2 [54, 63) 3 [27, 36) 4 [45, 54) Name: age, dtype: category Categories (7, interval[int64, left]): [[0, 9) < [9, 18) < [18, 27) < [27, 36) < [36, 45) < [45, 54) < [54, 63)]
然后,根据id和cuts和awards对切割进行分组,得到total_awards。通过^{}创建age_interval
id
cuts
awards
total_awards
age_interval
df_out = (df.groupby(['id', cuts]) .agg(total_awards=('awards', 'sum')) .reset_index(level=0) .reset_index(drop=True) ) df_out['age_interval'] = df_out.groupby('id').cumcount()
结果:
print(df_out) id total_awards age_interval 0 1 0 0 1 1 0 1 2 1 250 2 3 1 0 3 4 1 0 4 5 1 0 5 6 1 50 6 7 2 0 0 8 2 0 1 9 2 0 2 10 2 193 3 11 2 0 4 12 2 209 5 13 2 0 6
这肯定涵盖了你要找的东西
df = pd.read_clipboard() bins = [i for i in range(0, 100 ,9)] results = df.groupby(['id', pd.cut(df.age, bins)])['awards'].sum().reset_index() print(results)
id age awards 0 1 (0, 9] NaN 1 1 (9, 18] NaN 2 1 (18, 27] 250.0 3 1 (27, 36] NaN 4 1 (36, 45] NaN 5 1 (45, 54] 50.0 6 1 (54, 63] NaN 7 1 (63, 72] NaN 8 1 (72, 81] NaN 9 1 (81, 90] NaN 10 1 (90, 99] NaN 11 2 (0, 9] NaN 12 2 (9, 18] NaN 13 2 (18, 27] NaN 14 2 (27, 36] 193.0 15 2 (36, 45] NaN 16 2 (45, 54] 209.0 17 2 (54, 63] NaN 18 2 (63, 72] NaN 19 2 (72, 81] NaN 20 2 (81, 90] NaN 21 2 (90, 99] NaN
可以按如下方式定义料仓和切割:
然后,根据} 创建
id
和cuts
和awards
对切割进行分组,得到total_awards
。通过^{age_interval
结果:
这肯定涵盖了你要找的东西
相关问题 更多 >
编程相关推荐