基于列比较的条件累积和在pandas数据框中
我对pandas还比较陌生,虽然我知道应该有简单的解决办法,但我自己没搞明白。我有一个交易记录的数据表,看起来是这样的:
OrderId Size Price Side TimeSecO TimeUSecO TimeSecOT TimeUSecOT AmountBuy AmountSell
10 100 41.44000000 BUY 1403200077 47720 1403200100 640070
11 100 41.43000000 BUY 1403200077 47979 1403200112 43383
12 100 41.45000000 SELL 1403200077 48311 1403200090 61100
14 100 41.45000000 BUY 1403200092 253793 1403200092 374767
17 100 41.44000000 SELL 1403200103 24382 1403200125 929563
20 100 41.43000000 SELL 1403200116 208057 1403200116 226762
31 100 41.46000000 SELL 1403200214 874124 1403200259 751002
37 100 41.46000000 BUY 1403200278 494827 1403200300 729545
42 100 41.45000000 BUY 1403200335 601039 1403200361 925384
42 100 41.45000000 BUY 1403200335 601039 1403200361 925415
45 500 15.54000000 SELL 1403200365 997248 1403200741 26216
49 100 41.45000000 SELL 1403200375 419253 1403200402 959968
53 100 42.61000000 SELL 1403200377 403525 1403200377 403680
54 100 42.61000000 BUY 1403200377 501636 1403200377 501770
我想要计算每个订单编号(OrderId)的滚动累计总和,并把结果放到两个新列里,分别是CumAmountBuy和CumAmountSell,这两个列是根据Side列来区分的,前提是TimeSecO要大于TimeSecOT。
举个例子,在上面的数据表中,订单编号10、11和12的累计总和应该都是CumAmountBuy = 0和CumAmountSell = 0,因为在数据表里没有记录满足1403200077大于TimeUSecOT的条件。
而对于订单编号14,CumAmountBuy = 0,CumAmountSell = 100,因为在这个时候,订单编号12已经发生了,而且它的Side是SELL,满足了TimeSecO大于TimeSecOT的条件(1403200092大于1403200090)。
1 个回答
1
我能想到一个不太干净的办法,但当数据表变得非常大的时候,我觉得这样做效率不高。
In [42]: df['flag'] = df.TimeSecO.map(lambda sec: (sec > df.TimeSecOT).values)
In [43]: df['CumAmountBuy'] = df.flag.map(lambda f: np.dot(f,df['Size']*(df['Side']=='BUY')))
In [44]: df['CumAmountSell'] = df.flag.map(lambda f: np.dot(f,df['Size']*(df['Side']=='SELL')))
In [45]: df[['CumAmountBuy','CumAmountSell']]
Out[45]:
CumAmountBuy CumAmountSell
OrderId
10 0 0
11 0 0
12 0 0
14 0 100
17 200 100
20 300 100
31 300 300
37 300 400
42 400 400
42 400 400
45 600 400
49 600 400
53 600 400
54 600 400