Pandas滚动应用自定义 - 问答 - Python中文网

Pandas滚动应用自定义

2024-05-16 00:50:46 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我一直在遵循一个类似的答案here，但在使用sklearn和rolling apply时，我有一些问题。我试图创建z-score并使用rolling apply进行主成分分析，但我一直在获得'only length-1 arrays can be converted to Python scalars' error.

在前面的例子之后，我创建了一个数据帧

from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np
sc=StandardScaler() 
tmp=pd.DataFrame(np.random.randn(2000,2)/10000,index=pd.date_range('2001-01-01',periods=2000),columns=['A','B'])

如果使用rolling命令：

 tmp.rolling(window=5,center=False).apply(lambda x: sc.fit_transform(x))
 TypeError: only length-1 arrays can be converted to Python scalars

我知道这个错误。不过，我可以用平均值和标准差创建函数，没有问题。

def test(df):
    return np.mean(df)
tmp.rolling(window=5,center=False).apply(lambda x: test(x))

我相信错误发生在我试图用z分数的当前值减去平均值的时候。

def test2(df):
    return df-np.mean(df)
tmp.rolling(window=5,center=False).apply(lambda x: test2(x))
only length-1 arrays can be converted to Python scalars

如何使用sklearn创建自定义滚动功能，以便首先标准化然后运行PCA？

编辑：我知道我的问题不太清楚，所以我要再试一次。我想标准化我的值，然后运行主成分分析得到每个因素解释的方差。在不滚动的情况下执行此操作相当简单。

testing=sc.fit_transform(tmp)
pca=decomposition.pca.PCA() #run pca
pca.fit(testing) 
pca.explained_variance_ratio_
array([ 0.50967441,  0.49032559])

我不能用同样的程序滚动。使用@piRSquared中的rolling zscore函数可以得到zscores。似乎来自sklearn的PCA与rolling apply自定义函数不兼容。（事实上，我认为这是大多数sklearn模块的情况。）我只是试图得到解释的方差，这是一个一维项，但下面的代码返回一堆nan。

def test3(df):
    pca.fit(df)
    return pca.explained_variance_ratio_
tmp.rolling(window=5,center=False).apply(lambda x: test3(x))

但是，我可以创建自己的解释方差函数，但这也不起作用。

def test4(df):
    cov_mat=np.cov(df.T) #need covariance of features, not observations
    eigen_vals,eigen_vecs=np.linalg.eig(cov_mat)
    tot=sum(eigen_vals)
    var_exp=[(i/tot) for i in sorted(eigen_vals,reverse=True)]
    return var_exp
tmp.rolling(window=5,center=False).apply(lambda x: test4(x))

我得到这个错误0-dimensional array given. Array must be at least two-dimensional。

概括地说，我想运行滚动z分数，然后滚动pca输出解释的方差在每一个滚动。我有滚动z-得分下降，但没有解释方差。

Tags： lambda 函数 false df np be sklearn window

1条回答

网友

1楼 · 发布于 2024-05-16 00:50:46

正如@BrenBarn所评论的，滚动函数需要将一个向量减少为一个数字。以下内容相当于您试图做的事情，并帮助您突出显示问题。

zscore = lambda x: (x - x.mean()) / x.std()
tmp.rolling(5).apply(zscore)

TypeError: only length-1 arrays can be converted to Python scalars

在zscore函数中，x.mean()减少，x.std()减少，但是x是一个数组。所以整件事就是一个数组。

解决这个问题的方法是对z-score计算中需要的部分执行滚动，而不是对导致问题的部分执行滚动。

(tmp - tmp.rolling(5).mean()) / tmp.rolling(5).std()

相关问题更多 >

编程相关推荐

热门问题

热门文章