Python pandas - 分桶并求和

1 投票
1 回答
762 浏览
提问于 2025-04-18 12:17

这里有两个系列的数据,分别叫做 timestampstrades,它们的长度是一样的。trades 代表的是在特定的 timestamps 时刻发生的单个交易。我想要制作一个交易量图表,把这些交易按每30分钟分成一组,然后显示每组的累计交易量。我该怎么做呢?具体来说,我该如何进行分组?

1 个回答

2

如果你确定它们的长度是一样的,我假设它们都是用整数作为索引的 pd.Series。你可以把 trades 的索引改成 timestamps,然后进行 resample 操作。我用随机数据来展示这个方法:

# timestamps (random # of seconds after a certain time)
In [41]: timestamps = [pd.Timestamp('2013-7-1 9:00') + pd.offsets.Second (n) for n in np.random.randint(1,600, 30).cumsum() ]

In [42]: timestamps = pd.Series(timestamps, name='timestamps')

In [43]: timestamps 
Out[43]: 
0    2013-07-01 09:00:54
1    2013-07-01 09:05:37
2    2013-07-01 09:09:23
3    2013-07-01 09:15:29
4    2013-07-01 09:23:49
5    2013-07-01 09:29:29
6    2013-07-01 09:35:03
7    2013-07-01 09:35:31
8    2013-07-01 09:40:12
9    2013-07-01 09:49:08
10   2013-07-01 09:52:20
11   2013-07-01 09:56:43
12   2013-07-01 09:59:29
13   2013-07-01 10:04:10
14   2013-07-01 10:09:00
15   2013-07-01 10:12:20
16   2013-07-01 10:17:19
17   2013-07-01 10:25:30
18   2013-07-01 10:26:05
19   2013-07-01 10:28:55
20   2013-07-01 10:32:56
21   2013-07-01 10:38:46
22   2013-07-01 10:40:45
23   2013-07-01 10:46:35
24   2013-07-01 10:56:10
25   2013-07-01 10:57:28
26   2013-07-01 11:06:52
27   2013-07-01 11:10:52
28   2013-07-01 11:14:46
29   2013-07-01 11:20:49
Name: timestamps, dtype: datetime64[ns]

# trades: random number of trades at each timestamp
In [44]: trades = pd.Series(np.random.randint(1, 1000, 30), name='trades')

timestamps 作为 trades 这个序列的索引

In [45]: trades.index = timestamps
In [46]: trades
Out[46]: 
timestamps
2013-07-01 09:00:54    548
2013-07-01 09:05:37    864
2013-07-01 09:09:23    972
2013-07-01 09:15:29    990
2013-07-01 09:23:49    585
2013-07-01 09:29:29    811
2013-07-01 09:35:03    196
2013-07-01 09:35:31    697
2013-07-01 09:40:12    221
2013-07-01 09:49:08    535
2013-07-01 09:52:20    842
2013-07-01 09:56:43     20
2013-07-01 09:59:29    678
2013-07-01 10:04:10    740
2013-07-01 10:09:00    829
2013-07-01 10:12:20    521
2013-07-01 10:17:19    694
2013-07-01 10:25:30    312
2013-07-01 10:26:05    371
2013-07-01 10:28:55    581
2013-07-01 10:32:56    220
2013-07-01 10:38:46    745
2013-07-01 10:40:45    573
2013-07-01 10:46:35    267
2013-07-01 10:56:10    834
2013-07-01 10:57:28    688
2013-07-01 11:06:52    277
2013-07-01 11:10:52    354
2013-07-01 11:14:46    645
2013-07-01 11:20:49    726
Name: trades, dtype: int64

如果你想把数据按30分钟的时间段重新分组,可以使用 pd.Series.resample

In [47]: trades_30min = trades.resample('30min', sum)

In [48]: trades_30min
Out[48]: 
timestamps
2013-07-01 09:00:00    4770
2013-07-01 09:30:00    3189
2013-07-01 10:00:00    4048
2013-07-01 10:30:00    3327
2013-07-01 11:00:00    2002
Freq: 30T, Name: trades, dtype: int64

撰写回答