pandas:基于时间戳合并行
我的数据是这样的:
date, cola, colb, colc
1,10,,
2,11,,
3,12,,
4,13,,
1,,14,
2,,15,
3,,16,
4,,17,
1,,,17
2,,,18
3,,,19
4,13,,20
我想根据第一列把行合并起来,输出结果应该像这样:
date, cola, colb, colc
1,10,14,17
2,11,15,18
3,12,16,19
4,13,17,20
我不能保证数据之间没有冲突,所以我希望能选择取最大值或者平均值。
1 个回答
1
你可以使用 groupby
。首先,从一个有重复数据的 csv
文件开始:
>>> !cat tomerge.csv
date, cola, colb, colc
1,10,,
2,11,,
1,,14,
2,,15,
1,,24,
2,,40,
1,,,17
2,,,18
把它读进来:
>>> df = pd.read_csv("tomerge.csv")
>>> df
date cola colb colc
0 1 10 NaN NaN
1 2 11 NaN NaN
2 1 NaN 14 NaN
3 2 NaN 15 NaN
4 1 NaN 24 NaN
5 2 NaN 40 NaN
6 1 NaN NaN 17
7 2 NaN NaN 18
然后就会出现神奇的效果:
>>> df.groupby("date").mean()
cola colb colc
date
1 10 19.0 17
2 11 27.5 18
>>> df.groupby("date").max()
cola colb colc
date
1 10 24 17
2 11 40 18