使用pandas计算滚动相关性

3条回答

网友

1楼 · 编辑于 2024-04-23 08:30:24

在Python 3.5上运行rolling.corr()会生成一个警告：该函数已被弃用，将来可能会停止工作。建议改用Series.rolling(window=<period>).corr(other=series)。 E、 g

data['scrip1DailyReturn'].rolling(window=90).corr(other=data['scrip2DailyReturn'])

网友

2楼 · 编辑于 2024-04-23 08:30:24

使用pandas.rolling_corr，而不是DataFrame.rolling_corr。此外，groupby返回生成器。见下面的代码。

代码：

import pandas as pd

df = pd.read_csv("color.csv")
df_gen = df.copy().groupby("Color")

for key, value in df_gen:
    print "key: {}".format(key)
    print value.rolling_corr(value["Value1"],value["Value2"], 3)

输出：

key: Blue
1          NaN
3          NaN
6     0.931673
8     0.865066
10    0.089304
12   -0.998656
15   -0.971373
17   -0.667316
dtype: float64
key: Red
0          NaN
2          NaN
5    -0.911357
9    -0.152221
11   -0.971153
14    0.438697
18   -0.550727
dtype: float64
key: Yellow
4          NaN
7          NaN
13   -0.040330
16    0.879371
dtype: float64

您可以将循环部分更改为以下内容，以查看带有新列的原始dataframe post分组。

for key, value in df_gen:
    value["ROLL_CORR"] = pd.rolling_corr(value["Value1"],value["Value2"], 3)
    print value

输出：

   Color    Value1    Value2  ROLL_CORR
1   Blue  0.951227  0.514999        NaN
3   Blue  0.649112  0.513052        NaN
6   Blue  0.148165  0.342205   0.931673
8   Blue  0.626883  0.421530   0.865066
10  Blue  0.286738  0.583811   0.089304
12  Blue  0.966779  0.227340  -0.998656
15  Blue  0.065493  0.887640  -0.971373
17  Blue  0.757932  0.900103  -0.667316
key: Red
   Color    Value1    Value2  ROLL_CORR
0    Red  0.201435  0.981871        NaN
2    Red  0.522955  0.357239        NaN
5    Red  0.806326  0.310039  -0.911357
9    Red  0.656126  0.678047  -0.152221
11   Red  0.435898  0.908388  -0.971153
14   Red  0.116419  0.555821   0.438697
18   Red  0.793102  0.168033  -0.550727
key: Yellow
     Color    Value1    Value2  ROLL_CORR
4   Yellow  0.099474  0.143293        NaN
7   Yellow  0.073128  0.749297        NaN
13  Yellow  0.006777  0.318383  -0.040330
16  Yellow  0.345647  0.993382   0.879371

如果您想在处理后将它们全部连接在一起（顺便说一句，这可能会让其他人感到困惑），只需在处理组之后使用concat。

import pandas as pd

df = pd.read_csv("color.csv")
df_gen = df.copy().groupby("Color")

dfs = [] # Container for dataframes.

for key, value in df_gen:
    value["ROLL_CORR"] = pd.rolling_corr(value["Value1"],value["Value2"], 3)
    print value
    dfs.append(value)

df_final = pd.concat(dfs)
print df_final

输出：

     Color    Value1    Value2  ROLL_CORR
1     Blue  0.951227  0.514999        NaN
3     Blue  0.649112  0.513052        NaN
6     Blue  0.148165  0.342205   0.931673
8     Blue  0.626883  0.421530   0.865066
10    Blue  0.286738  0.583811   0.089304
12    Blue  0.966779  0.227340  -0.998656
15    Blue  0.065493  0.887640  -0.971373
17    Blue  0.757932  0.900103  -0.667316
0      Red  0.201435  0.981871        NaN
2      Red  0.522955  0.357239        NaN
5      Red  0.806326  0.310039  -0.911357
9      Red  0.656126  0.678047  -0.152221
11     Red  0.435898  0.908388  -0.971153
14     Red  0.116419  0.555821   0.438697
18     Red  0.793102  0.168033  -0.550727
4   Yellow  0.099474  0.143293        NaN
7   Yellow  0.073128  0.749297        NaN
13  Yellow  0.006777  0.318383  -0.040330
16  Yellow  0.345647  0.993382   0.879371

希望这有帮助。

网友

3楼 · 编辑于 2024-04-23 08:30:24

我找到了一个有效的解决办法。相当简单。

def roll_corr_groupby(x,i):
    x['Z'] = rolling_corr(x['col 1'], x['col 2'],i) 
    return x

x.groupby(['key']).apply(roll_corr_groupby)
x.head()

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用pandas计算滚动相关性

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >