Pandas分组累计数

2024-04-25 18:12:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试从勾选列表重新采样/分组到OHLC非时间序列(范围条、音量条等)。在

第一次挑战(按累积量分组):

原始数据:

   symbol         utime                time   price  vol   cc   cv ttype  \
id                                                                          
1   DOLX16  1.476961e+09 2016-10-20 09:00:37  3179.0    5  120  120     R   
2   DOLX16  1.476961e+09 2016-10-20 09:00:37  3179.0   10  735  120     R   
3   DOLX16  1.476961e+09 2016-10-20 09:00:37  3179.0   20  735  120     R   
4   DOLX16  1.476961e+09 2016-10-20 09:00:37  3179.0   30  735    3     R   
5   DOLX16  1.476961e+09 2016-10-20 09:00:37  3179.0    5  735  147     R   

我需要“groupby”行,直到vol列的总和<;=[constant]例如,当达到500时,再次开始求和。。。在

伪码:

^{pr2}$

谢谢你的帮助!在


Tags: 列表原始数据time时间序列symbolpricecv
2条回答

ol的“cumsum”把戏怎么样:

import pandas as pd
def grouper(df, threshold=500):
    df['new_bin'] = 0
    cum_vol = 0
    for i in df.index:
        if cum_vol >= threshold:
            df.loc[i, 'new_bin'] = 1
            cum_vol = 0
        cum_vol += df.loc[i, "vol"]
    df['group'] = df['new_bin'].cumsum()
    return df.drop("new_bin", axis=1)


df = pd.DataFrame({"price" : [1, 2, 3, 2, 4, 5, 6, 7, 6],
                  "vol": [100, 300, 101, 100, 402, 103, 300, 100, 30]})

df = grouper(df)
print df

   price  vol  group
0      1  100      0
1      2  300      0
2      3  101      0
3      2  100      1
4      4  402      1
5      5  103      2
6      6  300      2
7      7  100      2
8      6   30      3


adict  = {'open': 'first', 'high':'max', 'low':'min', 'close' : 'last' }
print df.groupby("group")['price'].agg(adict)

       high  close  open  low
group                        
0         3      3     1    1
1         4      4     2    2
2         7      7     5    5
3         6      6     6    6

print df.groupby("group")['vol'].sum()

group
0    501
1    502
2    503
3     30

考虑使用双正斜杠的整型除法运算符//,将vol的累积和除以vol峎amount的倍数进行拆分。然后在price聚合中使用该分组:

vol_amount = 100

data['volcum'] = data['vol'].cumsum()
data['volcumgrp'] = data['volcum'] - ((data['volcum'] // vol_amount) * vol_amount)


adict = {'open': 'first', 'high':'max', 'low':'min', 'close' : 'last'}

ohlc_vol = data.groupby(['volcumgrp'])['price'].agg(adict)
ohlc_vol['ticks_count'] = data.groupby(['volcumgrp'])['vol'].count()

使用已发布数据帧的重复堆栈演示数据:

^{pr2}$

输出

数据df(每100组)

       id        symbol       utime      time   price  vol   cc   cv ttype  volcum  volcumgrp
1  DOLX16  1.476961e+09  2016-10-20  09:00:37  3192.0    5  120  120     R       5          5
2  DOLX16  1.476961e+09  2016-10-20  09:00:37  3185.0   10  735  120     R      15         15
3  DOLX16  1.476961e+09  2016-10-20  09:00:37  3179.0   20  735  120     R      35         35
4  DOLX16  1.476961e+09  2016-10-20  09:00:37  3192.0   30  735    3     R      65         65
5  DOLX16  1.476961e+09  2016-10-20  09:00:37  3197.0    5  735  147     R      70         70
1  DOLX16  1.476961e+09  2016-10-20  09:00:37  3192.0    5  120  120     R      75         75
2  DOLX16  1.476961e+09  2016-10-20  09:00:37  3184.0   10  735  120     R      85         85
3  DOLX16  1.476961e+09  2016-10-20  09:00:37  3191.0   20  735  120     R     105          5
4  DOLX16  1.476961e+09  2016-10-20  09:00:37  3181.0   30  735    3     R     135         35
5  DOLX16  1.476961e+09  2016-10-20  09:00:37  3197.0    5  735  147     R     140         40
1  DOLX16  1.476961e+09  2016-10-20  09:00:37  3199.0    5  120  120     R     145         45
2  DOLX16  1.476961e+09  2016-10-20  09:00:37  3188.0   10  735  120     R     155         55
3  DOLX16  1.476961e+09  2016-10-20  09:00:37  3180.0   20  735  120     R     175         75
4  DOLX16  1.476961e+09  2016-10-20  09:00:37  3179.0   30  735    3     R     205          5
5  DOLX16  1.476961e+09  2016-10-20  09:00:37  3196.0    5  735  147     R     210         10
1  DOLX16  1.476961e+09  2016-10-20  09:00:37  3178.0    5  120  120     R     215         15
2  DOLX16  1.476961e+09  2016-10-20  09:00:37  3190.0   10  735  120     R     225         25
3  DOLX16  1.476961e+09  2016-10-20  09:00:37  3195.0   20  735  120     R     245         45
4  DOLX16  1.476961e+09  2016-10-20  09:00:37  3182.0   30  735    3     R     275         75
5  DOLX16  1.476961e+09  2016-10-20  09:00:37  3181.0    5  735  147     R     280         80
1  DOLX16  1.476961e+09  2016-10-20  09:00:37  3199.0    5  120  120     R     285         85
2  DOLX16  1.476961e+09  2016-10-20  09:00:37  3191.0   10  735  120     R     295         95
3  DOLX16  1.476961e+09  2016-10-20  09:00:37  3192.0   20  735  120     R     315         15
4  DOLX16  1.476961e+09  2016-10-20  09:00:37  3191.0   30  735    3     R     345         45
5  DOLX16  1.476961e+09  2016-10-20  09:00:37  3179.0    5  735  147     R     350         50

ohlc\u vol df

             open     low    high   close  ticks_count
volcumgrp                                             
5          3192.0  3179.0  3192.0  3179.0            3
10         3196.0  3196.0  3196.0  3196.0            1
15         3185.0  3178.0  3192.0  3192.0            3
25         3190.0  3190.0  3190.0  3190.0            1
35         3179.0  3179.0  3181.0  3181.0            2
40         3197.0  3197.0  3197.0  3197.0            1
45         3199.0  3191.0  3199.0  3191.0            3
50         3179.0  3179.0  3179.0  3179.0            1
55         3188.0  3188.0  3188.0  3188.0            1
65         3192.0  3192.0  3192.0  3192.0            1
70         3197.0  3197.0  3197.0  3197.0            1
75         3192.0  3180.0  3192.0  3182.0            3
80         3181.0  3181.0  3181.0  3181.0            1
85         3184.0  3184.0  3199.0  3199.0            2
95         3191.0  3191.0  3191.0  3191.0            1

相关问题 更多 >