Python pandas - 分桶并求和
这里有两个系列的数据,分别叫做 timestamps
和 trades
,它们的长度是一样的。trades
代表的是在特定的 timestamps
时刻发生的单个交易。我想要制作一个交易量图表,把这些交易按每30分钟分成一组,然后显示每组的累计交易量。我该怎么做呢?具体来说,我该如何进行分组?
1 个回答
2
如果你确定它们的长度是一样的,我假设它们都是用整数作为索引的 pd.Series
。你可以把 trades
的索引改成 timestamps
,然后进行 resample
操作。我用随机数据来展示这个方法:
# timestamps (random # of seconds after a certain time)
In [41]: timestamps = [pd.Timestamp('2013-7-1 9:00') + pd.offsets.Second (n) for n in np.random.randint(1,600, 30).cumsum() ]
In [42]: timestamps = pd.Series(timestamps, name='timestamps')
In [43]: timestamps
Out[43]:
0 2013-07-01 09:00:54
1 2013-07-01 09:05:37
2 2013-07-01 09:09:23
3 2013-07-01 09:15:29
4 2013-07-01 09:23:49
5 2013-07-01 09:29:29
6 2013-07-01 09:35:03
7 2013-07-01 09:35:31
8 2013-07-01 09:40:12
9 2013-07-01 09:49:08
10 2013-07-01 09:52:20
11 2013-07-01 09:56:43
12 2013-07-01 09:59:29
13 2013-07-01 10:04:10
14 2013-07-01 10:09:00
15 2013-07-01 10:12:20
16 2013-07-01 10:17:19
17 2013-07-01 10:25:30
18 2013-07-01 10:26:05
19 2013-07-01 10:28:55
20 2013-07-01 10:32:56
21 2013-07-01 10:38:46
22 2013-07-01 10:40:45
23 2013-07-01 10:46:35
24 2013-07-01 10:56:10
25 2013-07-01 10:57:28
26 2013-07-01 11:06:52
27 2013-07-01 11:10:52
28 2013-07-01 11:14:46
29 2013-07-01 11:20:49
Name: timestamps, dtype: datetime64[ns]
# trades: random number of trades at each timestamp
In [44]: trades = pd.Series(np.random.randint(1, 1000, 30), name='trades')
把 timestamps
作为 trades
这个序列的索引
In [45]: trades.index = timestamps
In [46]: trades
Out[46]:
timestamps
2013-07-01 09:00:54 548
2013-07-01 09:05:37 864
2013-07-01 09:09:23 972
2013-07-01 09:15:29 990
2013-07-01 09:23:49 585
2013-07-01 09:29:29 811
2013-07-01 09:35:03 196
2013-07-01 09:35:31 697
2013-07-01 09:40:12 221
2013-07-01 09:49:08 535
2013-07-01 09:52:20 842
2013-07-01 09:56:43 20
2013-07-01 09:59:29 678
2013-07-01 10:04:10 740
2013-07-01 10:09:00 829
2013-07-01 10:12:20 521
2013-07-01 10:17:19 694
2013-07-01 10:25:30 312
2013-07-01 10:26:05 371
2013-07-01 10:28:55 581
2013-07-01 10:32:56 220
2013-07-01 10:38:46 745
2013-07-01 10:40:45 573
2013-07-01 10:46:35 267
2013-07-01 10:56:10 834
2013-07-01 10:57:28 688
2013-07-01 11:06:52 277
2013-07-01 11:10:52 354
2013-07-01 11:14:46 645
2013-07-01 11:20:49 726
Name: trades, dtype: int64
如果你想把数据按30分钟的时间段重新分组,可以使用 pd.Series.resample
:
In [47]: trades_30min = trades.resample('30min', sum)
In [48]: trades_30min
Out[48]:
timestamps
2013-07-01 09:00:00 4770
2013-07-01 09:30:00 3189
2013-07-01 10:00:00 4048
2013-07-01 10:30:00 3327
2013-07-01 11:00:00 2002
Freq: 30T, Name: trades, dtype: int64