在Pandas中按前置0分组行
我有一个需求,就是要把在A列前面有0的行进行分组,并把B列的值加起来。对于那些前面没有0的行,直接在输出中显示原样的内容。请问我该如何使用pandas数据框来实现这个功能呢?我在附上的截图中给出了示例分组和期望的输出。即使前面有20行也是要分组的。
可以重复使用的输入:
df = pd.DataFrame({'Column A': [0,0,1,1,1,0,-1,-1,0,0,0,1,0,0,0,0,-1],
'Column B': [5,3,4,2,1,3,7,5,10,2,3,4,5,5,5,5,5]
})
1 个回答
0
如果我理解得没错,你可以使用一个反向的 cumsum
来形成一个组,然后计算 groupby.transform('sum')
,最后用 mask
来处理输出结果:
group = df.loc[::-1, 'Column A'].ne(0).cumsum()[::-1]
m = group.duplicated(keep='last')
out = df.join(df.groupby(group, as_index=False).transform('sum')
.add_suffix(' output').mask(m))
输出结果:
Column A Column B Column A output Column B output
0 0 5 NaN NaN
1 0 3 NaN NaN
2 1 4 1.0 12.0
3 1 2 1.0 2.0
4 1 1 1.0 1.0
5 0 3 NaN NaN
6 -1 7 -1.0 10.0
7 -1 5 -1.0 5.0
8 0 10 NaN NaN
9 0 2 NaN NaN
10 0 3 NaN NaN
11 1 4 1.0 19.0
12 0 5 NaN NaN
13 0 5 NaN NaN
14 0 5 NaN NaN
15 0 5 NaN NaN
16 -1 5 -1.0 25.0
另外,如果你只想要 B 的总和作为输出:
group = df.loc[::-1, 'Column A'].ne(0).cumsum()[::-1]
m = group.duplicated(keep='last')
df.loc[~m, 'out'] = df.groupby(group)['Column B'].transform('sum')
或者,有一个稍微更高效的变体:
group = df.loc[::-1, 'Column A'].ne(0).cumsum()[::-1]
m = group.duplicated(keep='last')
df.loc[~m, 'out'] = df.groupby(group, sort=False)['Column B'].sum().to_numpy()
输出结果:
Column A Column B out
0 0 5 NaN
1 0 3 NaN
2 1 4 12.0
3 1 2 2.0
4 1 1 1.0
5 0 3 NaN
6 -1 7 10.0
7 -1 5 5.0
8 0 10 NaN
9 0 2 NaN
10 0 3 NaN
11 1 4 19.0
12 0 5 NaN
13 0 5 NaN
14 0 5 NaN
15 0 5 NaN
16 -1 5 25.0