如何使用Pandas来避免for循环以优化概率加权移动平均(PWEMA)?有没有办法利用EWM?

2024-04-25 22:20:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试优化PWEMA功能,以获得熊猫异常检测的实时传感器数据。我目前使用iterrows()将其作为for循环运行,但希望利用Pandas函数更快地计算更大的数据集,例如使用Apply()或矢量化,或利用EMA函数

PWEMA不同于EMA,因为beta参数控制了标准EMA设置为0时允许异常值影响MA的程度

import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import datetime
import matplotlib.pyplot as plt
#plt.style.use('fivethirtyeight')
#%config InlineBackend.figure_format = 'retina'
#%matplotlib inline
from itertools import islice
from math import sqrt
from scipy.stats import norm


# generate random sensor data
ts = pd.date_range(start ='1-1-2019',  
         end ='1-10-2019', freq ='5T') 


np.random.seed(seed=1111)
data = np.random.normal(2.012547, 1.557331,size=len(ts))
df = pd.DataFrame({'timestamp': ts, 'speed': data})
df.speed = df.speed.abs()
df = df.set_index('timestamp')
time_col = 'timestamp'
value_col = 'speed'

#pewna parameters
T = 30      # number of points to consider in initial average
beta = 0.5  # parameter that controls how much you allow outliers to affect your MA, for standard EWMA set to 0EWMA
a = 0.99    # the maximum value of the EWMA a parameter, used for outliers
z = 3

#the PEWNA Model
#as described in Carter, Kevin M., and William W. Streilein. 

# create a DataFrame for the run time variables we'll need to calculate
pewm = pd.DataFrame(index=df.index, columns=['Mean', 'Var', 'Std'], dtype=float)
pewm.iloc[0] = [df.iloc[0][value_col], 0, 0]

t = 0

for _, row in islice(df.iterrows(), 1, None):
    diff = row[value_col] - pewm.iloc[t].Mean # difference from moving average
    p = norm.pdf(diff / pewm.iloc[t].Std) if pewm.iloc[t].Std != 0 else 0 # Prob of observing diff
    a_t = a * (1 - beta * p) if t > T else 1 - 1/(t+1) # weight to give to this point
    incr = (1 - a_t) * diff

    # Update Mean, Var, Std
    pewm.iloc[t+1].Mean = pewm.iloc[t].Mean + incr
    pewm.iloc[t+1].Var = a_t * (pewm.iloc[t].Var + diff * incr)
    pewm.iloc[t+1].Std = sqrt(pewm.iloc[t+1].Var)
    t += 1


The for loop currently takes too long to run on larger sets

Tags: tofromimportdfforvarasdiff

热门问题