我得到了如下数据:
{'grp': {0: 828893, 1: 828893, 2: 828893, 3: 828893, 4: 828893, 5: 828893, 6: 828893, 7: 828893, 8: 828893, 9: 828893, 10: 828893, 11: 828893, 12: 828893, 13: 828893, 14: 828893, 15: 828893, 16: 828893, 17: 828893, 18: 828893, 19: 828893, 20: 828893, 21: 828893, 22: 828893, 23: 828893, 24: 828893}, 'grp2': {0: nan, 1: nan, 2: nan, 3: nan, 4: '1', 5: '1', 6: '1', 7: '1', 8: '1', 9: '1', 10: nan, 11: nan, 12: '2', 13: '2', 14: '2', 15: '2', 16: nan, 17: nan, 18: nan, 19: '3', 20: nan, 21: '4', 22: '4', 23: '4', 24: '4'}, 'val1': {0: -50.0, 1: -50.0, 2: -50.0, 3: -50.0, 4: 7.600000000000001, 5: 54.599999999999994, 6: 38.599999999999994, 7: 50.599999999999994, 8: 91.0, 9: 100.80000000000001, 10: 19.200000000000003, 11: -50.0, 12: -50.0, 13: 69.6, 14: 42.0, 15: 90.19999999999999, 16: -50.0, 17: -50.0, 18: 47.599999999999994, 19: 98.80000000000001, 20: 27.599999999999994, 21: 11.799999999999997, 22: nan, 23: 13.0, 24: 0.0}, 'val2': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 30.1, 5: 21.5, 6: 20.7, 7: 4.2, 8: 5.0, 9: 21.6, 10: 85.1, 11: 0.0, 12: 0.0, 13: 36.4, 14: 56.6, 15: 51.2, 16: 0.0, 17: 0.0, 18: 58.5, 19: 42.2, 20: 76.1, 21: 68.7, 22: nan, 23: 90.3, 24: 95.3}}
我想先按列grp
和grp2
对它进行分组,然后创建一个新的列val1_b
和val2_b
,分别定义为val1
和val2
的上一次和第一次观察(在组内)之间的差异。R中的代码类似于:
ex %>%
group_by(grp, grp2) %>%
mutate(val1_b = last(val1) - first(val1),
val2_b = last(val2) - first(val2)) %>%
ungroup()
但我需要用Python来做。离我最近的是:
pd.DataFrame(ex).groupby(['grp', 'grp2'])['val1'].apply(lambda x: x.iat[-1] - x.iat[0])
但这只针对一列,结果是总结的,而不是添加到数据框中。所以,我如何计算一个组中几个列的最后一个和第一个观察值之间的差异,并将其作为新列添加到数据框中
将^{} 与^{} 和一起使用
^{} ,对于新列,^{} 和^{} 是一种可能的解决方案:
如评论中提到的@Wen Ben是不带
join
的可能替代品(谢谢):你的意思是R中的
mutate
,这里的pandas
是transform
相关问题 更多 >
编程相关推荐