如何计算Pandas滚动窗口中的波动率(标准差)

2024-05-17 00:11:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个时间序列“Ser”,我想用滚动窗口计算波动率(标准差)。我当前的代码正确地以这种形式:

w=10
for timestep in range(length):
    subSer=Ser[timestep:timestep+w]
    mean_i=np.mean(subSer)
    vol_i=(np.sum((subSer-mean_i)**2)/len(subSer))**0.5
    volList.append(w_i)

在我看来这是非常低效的。熊猫有没有做这种事情的内置功能?


Tags: 代码infornp时间range序列mean
3条回答

通常,[金融类]人们会以年化价格变动百分比来表示波动性。

假设你在一个数据框中有每日价格,并且一年中有252个交易日,你可能会想要如下内容:

df.pct_change().rolling(window_size).std()*(252**0.5)

这里有一个新的方法-

# From http://stackoverflow.com/a/14314054/3293881 by @Jaime
def moving_average(a, n=3) :
    ret = np.cumsum(a, dtype=float)
    ret[n:] = ret[n:] - ret[:-n]
    return ret[n - 1:] / n

# From http://stackoverflow.com/a/40085052/3293881
def strided_app(a, L, S=1 ):  # Window len = L, Stride len/stepsize = S
    nrows = ((a.size-L)//S)+1
    n = a.strides[0]
    return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))

def rolling_meansqdiff_numpy(a, w):
    A = strided_app(a, w)
    B = moving_average(a,w)
    subs = A-B[:,None]
    sums = np.einsum('ij,ij->i',subs,subs)
    return (sums/w)**0.5

样本运行-

In [202]: Ser = pd.Series(np.random.randint(0,9,(20)))

In [203]: rolling_meansqdiff_loopy(Ser, w=10)
Out[203]: 
[2.6095976701399777,
 2.3000000000000003,
 2.118962010041709,
 2.022374841615669,
 1.746424919657298,
 1.7916472867168918,
 1.3000000000000003,
 1.7776388834631178,
 1.6852299546352716,
 1.6881943016134133,
 1.7578395831246945]

In [204]: rolling_meansqdiff_numpy(Ser.values, w=10)
Out[204]: 
array([ 2.60959767,  2.3       ,  2.11896201,  2.02237484,  1.74642492,
        1.79164729,  1.3       ,  1.77763888,  1.68522995,  1.6881943 ,
        1.75783958])

运行时测试

迂回的方法-

def rolling_meansqdiff_loopy(Ser, w):
    length = Ser.shape[0]- w + 1
    volList= []
    for timestep in range(length):
        subSer=Ser[timestep:timestep+w]
        mean_i=np.mean(subSer)
        vol_i=(np.sum((subSer-mean_i)**2)/len(subSer))**0.5
        volList.append(vol_i)
    return volList

时间安排-

In [223]: Ser = pd.Series(np.random.randint(0,9,(10000)))

In [224]: %timeit rolling_meansqdiff_loopy(Ser, w=10)
1 loops, best of 3: 2.63 s per loop

# @Mad Physicist's vectorized soln
In [225]: %timeit Ser.rolling(10).std(ddof=0)
1000 loops, best of 3: 380 µs per loop

In [226]: %timeit rolling_meansqdiff_numpy(Ser.values, w=10)
1000 loops, best of 3: 393 µs per loop

两种矢量化方法比循环方法快得多!

看起来你在找^{}。您可以将^{}计算应用于结果对象:

roller = Ser.rolling(w)
volList = roller.std(ddof=0)

如果不打算再次使用滚动窗口对象,可以编写一行:

volList = Ser.rolling(w).std(ddof=0)

请记住,在这种情况下ddof=0是必要的,因为标准差的标准化是通过len(Ser)-ddof实现的,而ddof在熊猫中默认为1

相关问题 更多 >