Python pandas使用rolling apply to groupby对象以矢量化方式计算机车车辆beta

2024-06-02 19:02:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个大数据框,df,包含4列:

             id           period  ret_1m   mkt_ret_1m
131146       CAN00WG0     199609 -0.1538    0.047104
133530       CAN00WG0     199610 -0.0455   -0.014143
135913       CAN00WG0     199611  0.0000    0.040926
138334       CAN00WG0     199612  0.2952    0.008723
140794       CAN00WG0     199701 -0.0257    0.039916
143274       CAN00WG0     199702 -0.0038   -0.025442
145754       CAN00WG0     199703 -0.2992   -0.049279
148246       CAN00WG0     199704 -0.0919   -0.005948
150774       CAN00WG0     199705  0.0595    0.122322
153318       CAN00WG0     199706 -0.0337    0.045765

             id           period  ret_1m   mkt_ret_1m
160980       CAN00WH0     199709  0.0757    0.079293
163569       CAN00WH0     199710 -0.0741   -0.044000
166159       CAN00WH0     199711  0.1000   -0.014644
168782       CAN00WH0     199712 -0.0909   -0.007072
171399       CAN00WH0     199801 -0.0100    0.001381
174022       CAN00WH0     199802  0.1919    0.081924
176637       CAN00WH0     199803  0.0085    0.050415
179255       CAN00WH0     199804 -0.0168    0.018393
181880       CAN00WH0     199805  0.0427   -0.051279
184516       CAN00WH0     199806 -0.0656   -0.011516

             id           period  ret_1m   mkt_ret_1m
143275       CAN00WO0     199702 -0.1176   -0.025442
145755       CAN00WO0     199703 -0.0074   -0.049279
148247       CAN00WO0     199704 -0.0075   -0.005948
150775       CAN00WO0     199705  0.0451    0.122322

等等

我正在尝试使用一个函数计算一个常见的财务指标,称为beta,它包含两个列,ret祼m,每月股票回报率,ret祼m祼mkt,同期市场1个月回报率(期间id)。我想应用一个函数(calc_beta)来计算这个函数12个月的滚动结果。

为此,我正在创建一个groupby对象:

grp = df.groupby('id')

我想做的是使用类似于:

period = 12
for stock, sub_df in grp:
    arg = sub_df[['ret_1m', 'mkt_ret_1m']]
    beta = pd.rolling_apply(arg, period, calc_beta, min_periods = period)

现在,这是第一个问题。根据文档,pd.rolling_apply arg可以是序列或数据帧。但是,我提供的数据帧似乎被转换为只能包含一列数据的numpy数组,而不是我试图提供的两列数据。所以我下面的calc_beta代码将不起作用,因为我需要通过股票和市场回报:

def calc_beta(np_array)
    s = np_array[:,0] # stock returns are column zero from numpy array
    m = np_array[:,1] # market returns are column one from numpy array

    covariance = np.cov(s,m) # Calculate covariance between stock and market
    beta = covariance[0,1]/covariance[1,1]
return beta

所以我的问题如下,我认为这样列出来是有意义的:

(i)  How can I pass a data frame/multiple series/numpy array with more than one column to calc_beta using rolling_apply?
(ii) How can I return more than one value (e.g. the beta) from the calc_beta function? 
(iii) Having calculated rolling quantities, how can I recombined with the original dataframe df so that I have the rolling quantities corresponding to the correct date in the period column?
(iv) Is there a better (vectorized) way of achieving this?  I have seen some similar questions using e.g. df.apply(pd.rolling_apply,period,??) but I did not understand how these worked.

我认为rolling_apply以前无法处理数据帧,但文档表明它现在能够这样做。我的熊猫。版本是0.16.1。

谢谢你的帮助!我花了1.5天的时间才弄明白这一点,我完全被难住了。

最终,我想要的是这样的东西:

             id           period  ret_1m   mkt_ret_1m  beta  other_quantities
131146       CAN00WG0     199609 -0.1538    0.047104  0.521  xxx
133530       CAN00WG0     199610 -0.0455   -0.014143  0.627  xxxx
135913       CAN00WG0     199611  0.0000    0.040926  0.341  xxx
138334       CAN00WG0     199612  0.2952    0.008723  0.567  xx
140794       CAN00WG0     199701 -0.0257    0.039916  0.4612 xxx
143274       CAN00WG0     199702 -0.0038   -0.025442  0.215  xxx
145754       CAN00WG0     199703 -0.2992   -0.049279  0.4678  xxx
148246       CAN00WG0     199704 -0.0919   -0.005948  -0.4225  xxx
150774       CAN00WG0     199705  0.0595    0.122322  0.780  xxx
153318       CAN00WG0     199706 -0.0337    0.045765  0.623  xxx

             id           period  ret_1m   mkt_ret_1m  beta  other_quantities
160980       CAN00WH0     199709  0.0757    0.079293  -0.913  xx
163569       CAN00WH0     199710 -0.0741   -0.044000  0.894  xxx
166159       CAN00WH0     199711  0.1000   -0.014644  0.563  xxx
168782       CAN00WH0     199712 -0.0909   -0.007072  0.734  xxx
171399       CAN00WH0     199801 -0.0100    0.001381  0.894  xxxx
174022       CAN00WH0     199802  0.1919    0.081924  0.789  xx
176637       CAN00WH0     199803  0.0085    0.050415  0.1563  xxxx
179255       CAN00WH0     199804 -0.0168    0.018393  -0.64  xxxx
181880       CAN00WH0     199805  0.0427   -0.051279  -0.742  xxx
184516       CAN00WH0     199806 -0.0656   -0.011516  0.925  xxx

             id           period  ret_1m   mkt_ret_1m  beta
143275       CAN00WO0     199702 -0.1176   -0.025442  -1.52  xx
145755       CAN00WO0     199703 -0.0074   -0.049279  -0.632  xxx
148247       CAN00WO0     199704 -0.0075   -0.005948  1.521  xx
150775       CAN00WO0     199705  0.0451    0.122322  0.0321  xxx

等等


Tags: 数据iddfcalcarraybetaxxxperiod
2条回答

尝试pd.rolling_cov()和pd.rolling.var(),如下所示:

import pandas as pd
import numpy as np
from StringIO import StringIO

    df = pd.read_csv(StringIO('''              id  period  ret_1m  mkt_ret_1m
    131146  CAN00WG0  199609 -0.1538    0.047104
    133530  CAN00WG0  199610 -0.0455   -0.014143
    135913  CAN00WG0  199611  0.0000    0.040926
    138334  CAN00WG0  199612  0.2952    0.008723
    140794  CAN00WG0  199701 -0.0257    0.039916
    143274  CAN00WG0  199702 -0.0038   -0.025442
    145754  CAN00WG0  199703 -0.2992   -0.049279
    148246  CAN00WG0  199704 -0.0919   -0.005948
    150774  CAN00WG0  199705  0.0595    0.122322
    153318  CAN00WG0  199706 -0.0337    0.045765
    160980  CAN00WH0  199709  0.0757    0.079293
    163569  CAN00WH0  199710 -0.0741   -0.044000
    166159  CAN00WH0  199711  0.1000   -0.014644
    168782  CAN00WH0  199712 -0.0909   -0.007072
    171399  CAN00WH0  199801 -0.0100    0.001381
    174022  CAN00WH0  199802  0.1919    0.081924
    176637  CAN00WH0  199803  0.0085    0.050415
    179255  CAN00WH0  199804 -0.0168    0.018393
    181880  CAN00WH0  199805  0.0427   -0.051279
    184516  CAN00WH0  199806 -0.0656   -0.011516
    143275  CAN00WO0  199702 -0.1176   -0.025442
    145755  CAN00WO0  199703 -0.0074   -0.049279
    148247  CAN00WO0  199704 -0.0075   -0.005948
    150775  CAN00WO0  199705  0.0451    0.122322'''), sep='\s+')

    df['beta'] = pd.rolling_cov(df['ret_1m'], df['mkt_ret_1m'], window=6) / pd.rolling_var(df['mkt_ret_1m'], window=6)

print df

输出:

              id  period  ret_1m  mkt_ret_1m      beta
131146  CAN00WG0  199609 -0.1538    0.047104       NaN
133530  CAN00WG0  199610 -0.0455   -0.014143       NaN
135913  CAN00WG0  199611  0.0000    0.040926       NaN
138334  CAN00WG0  199612  0.2952    0.008723       NaN
140794  CAN00WG0  199701 -0.0257    0.039916       NaN
143274  CAN00WG0  199702 -0.0038   -0.025442 -1.245908
145754  CAN00WG0  199703 -0.2992   -0.049279  2.574464
148246  CAN00WG0  199704 -0.0919   -0.005948  2.657887
150774  CAN00WG0  199705  0.0595    0.122322  1.371090
153318  CAN00WG0  199706 -0.0337    0.045765  1.494095
160980  CAN00WH0  199709  0.0757    0.079293  1.616520
163569  CAN00WH0  199710 -0.0741   -0.044000  1.630411
166159  CAN00WH0  199711  0.1000   -0.014644  0.651220
168782  CAN00WH0  199712 -0.0909   -0.007072  0.652148
171399  CAN00WH0  199801 -0.0100    0.001381  0.724120
174022  CAN00WH0  199802  0.1919    0.081924  1.542782
176637  CAN00WH0  199803  0.0085    0.050415  1.605407
179255  CAN00WH0  199804 -0.0168    0.018393  1.571015
181880  CAN00WH0  199805  0.0427   -0.051279  1.139972
184516  CAN00WH0  199806 -0.0656   -0.011516  1.101890
143275  CAN00WO0  199702 -0.1176   -0.025442  1.372437
145755  CAN00WO0  199703 -0.0074   -0.049279  0.031939
148247  CAN00WO0  199704 -0.0075   -0.005948 -0.535855
150775  CAN00WO0  199705  0.0451    0.122322  0.341747

我想pd.rolling_apply在这种情况下没有帮助,因为在我看来,它实际上只需要Series(即使传递了数据帧,它也一次处理一列)。但是,您始终可以编写自己的使用数据帧的滚动应用程序。

import pandas as pd
import numpy as np
from StringIO import StringIO

df = pd.read_csv(StringIO('''              id  period  ret_1m  mkt_ret_1m
131146  CAN00WG0  199609 -0.1538    0.047104
133530  CAN00WG0  199610 -0.0455   -0.014143
135913  CAN00WG0  199611  0.0000    0.040926
138334  CAN00WG0  199612  0.2952    0.008723
140794  CAN00WG0  199701 -0.0257    0.039916
143274  CAN00WG0  199702 -0.0038   -0.025442
145754  CAN00WG0  199703 -0.2992   -0.049279
148246  CAN00WG0  199704 -0.0919   -0.005948
150774  CAN00WG0  199705  0.0595    0.122322
153318  CAN00WG0  199706 -0.0337    0.045765
160980  CAN00WH0  199709  0.0757    0.079293
163569  CAN00WH0  199710 -0.0741   -0.044000
166159  CAN00WH0  199711  0.1000   -0.014644
168782  CAN00WH0  199712 -0.0909   -0.007072
171399  CAN00WH0  199801 -0.0100    0.001381
174022  CAN00WH0  199802  0.1919    0.081924
176637  CAN00WH0  199803  0.0085    0.050415
179255  CAN00WH0  199804 -0.0168    0.018393
181880  CAN00WH0  199805  0.0427   -0.051279
184516  CAN00WH0  199806 -0.0656   -0.011516
143275  CAN00WO0  199702 -0.1176   -0.025442
145755  CAN00WO0  199703 -0.0074   -0.049279
148247  CAN00WO0  199704 -0.0075   -0.005948
150775  CAN00WO0  199705  0.0451    0.122322'''), sep='\s+')



def calc_beta(df):
    np_array = df.values
    s = np_array[:,0] # stock returns are column zero from numpy array
    m = np_array[:,1] # market returns are column one from numpy array

    covariance = np.cov(s,m) # Calculate covariance between stock and market
    beta = covariance[0,1]/covariance[1,1]
    return beta

def rolling_apply(df, period, func, min_periods=None):
    if min_periods is None:
        min_periods = period
    result = pd.Series(np.nan, index=df.index)

    for i in range(1, len(df)+1):
        sub_df = df.iloc[max(i-period, 0):i,:] #I edited here
        if len(sub_df) >= min_periods:
            idx = sub_df.index[-1]
            result[idx] = func(sub_df)
    return result

df['beta'] = np.nan
grp = df.groupby('id')
period = 6 #I'm using 6  to see some not NaN values, since sample data don't have longer than 12 groups
for stock, sub_df in grp:
    beta = rolling_apply(sub_df[['ret_1m','mkt_ret_1m']], period, calc_beta, min_periods = period)  
    beta.name = 'beta'
    df.update(beta)
print df

输出

            id  period  ret_1m  mkt_ret_1m      beta
131146  CAN00WG0  199609 -0.1538    0.047104       NaN
133530  CAN00WG0  199610 -0.0455   -0.014143       NaN
135913  CAN00WG0  199611  0.0000    0.040926       NaN
138334  CAN00WG0  199612  0.2952    0.008723       NaN
140794  CAN00WG0  199701 -0.0257    0.039916       NaN
143274  CAN00WG0  199702 -0.0038   -0.025442 -1.245908
145754  CAN00WG0  199703 -0.2992   -0.049279  2.574464
148246  CAN00WG0  199704 -0.0919   -0.005948  2.657887
150774  CAN00WG0  199705  0.0595    0.122322  1.371090
153318  CAN00WG0  199706 -0.0337    0.045765  1.494095
...          ...     ...     ...         ...       ...
171399  CAN00WH0  199801 -0.0100    0.001381       NaN
174022  CAN00WH0  199802  0.1919    0.081924  1.542782
176637  CAN00WH0  199803  0.0085    0.050415  1.605407
179255  CAN00WH0  199804 -0.0168    0.018393  1.571015
181880  CAN00WH0  199805  0.0427   -0.051279  1.139972
184516  CAN00WH0  199806 -0.0656   -0.011516  1.101890
143275  CAN00WO0  199702 -0.1176   -0.025442       NaN
145755  CAN00WO0  199703 -0.0074   -0.049279       NaN
148247  CAN00WO0  199704 -0.0075   -0.005948       NaN
150775  CAN00WO0  199705  0.0451    0.122322       NaN

相关问题 更多 >