用Python实现ARIMA模型的数据平稳化

2024-06-10 12:54:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用ARIMA模型来预测未来的时间序列值。在此之前,我需要使我的数据季节性免费,固定和detrend。我读了许多有关制作数据文具的文章。而且,到目前为止,我已经编写了以下代码,但仍然不能消除季节性以及平稳性。数据样本如下:

        DATE             X
 1992-01-01 03:00:00    10.2
 1992-01-01 06:00:00    10.4
 1992-01-01 09:00:00    11.8
 1992-01-01 12:00:00    12.0
 1992-01-01 15:00:00    10.4
 1992-01-01 18:00:00    9.4
 1992-01-01 21:00:00    10.4
 1992-01-02 00:00:00    13.6
 1992-01-02 03:00:00    13.2
 1992-01-02 06:00:00    11.8
 1992-01-02 09:00:00    12.0
 1992-01-02 12:00:00    12.8
 1992-01-02 15:00:00    12.6
 1992-01-02 18:00:00    11.0
 1992-01-02 21:00:00    12.2
 1992-01-03 00:00:00    13.8
 1992-01-03 03:00:00    14.0
 1992-01-03 06:00:00    13.4
 1992-01-03 09:00:00    14.2
 1992-01-03 12:00:00    16.2
 1992-01-03 15:00:00    13.2
 1992-01-03 18:00:00    13.4
 1992-01-03 21:00:00    13.8
 1992-01-04 00:00:00    14.8
 1992-01-04 03:00:00    13.8
 1992-01-04 06:00:00    7.6
 1992-01-04 09:00:00    5.8
 1992-01-04 12:00:00    4.4
 1992-01-04 15:00:00    5.6
 1992-01-04 18:00:00    6.0
 1992-01-04 21:00:00    7.0
 1992-01-05 00:00:00    6.8
 1992-01-05 03:00:00    3.4
 1992-01-05 06:00:00    5.8
 1992-01-05 09:00:00    10.6
 1992-01-05 12:00:00    9.2
 1992-01-05 15:00:00    10.6
 1992-01-05 18:00:00    9.8
 1992-01-05 21:00:00    11.2
 1992-01-06 00:00:00    12.0
 1992-01-06 03:00:00    10.2
 1992-01-06 06:00:00    9.0
 1992-01-06 09:00:00    9.0
 1992-01-06 12:00:00    8.6
 1992-01-06 15:00:00    8.4
 1992-01-06 18:00:00    8.2
 1992-01-06 21:00:00    8.8
 1992-01-07 00:00:00    10.0
 1992-01-07 03:00:00    9.6
 1992-01-07 06:00:00    8.0
 1992-01-07 09:00:00    9.6
 1992-01-07 12:00:00    10.8
 1992-01-07 15:00:00    10.2
 1992-01-07 18:00:00    9.8
 1992-01-07 21:00:00    10.2
 1992-01-08 00:00:00    9.4
 1992-01-08 03:00:00    11.4
 1992-01-08 06:00:00    12.6
 1992-01-08 09:00:00    12.8
 1992-01-08 12:00:00    10.4
 1992-01-08 15:00:00    11.2
 1992-01-08 18:00:00    9.0
 1992-01-08 21:00:00    10.2
 1992-01-09 00:00:00    8.2

上述数据集具有数据帧格式(总大小=70K)的20年的“X”值,平均周期为3小时 Original data fig1。由于数据集庞大而复杂,因此进行了数据准备,其中使用了整个数据的月平均值,monthly_mean_data 2使用以下代码

df_monthly = dataset.resample('M', on='DATE').mean()  # dataset contains DATE and x values
indexedDataset=monthly.copy()

test_stationarity(indexedDataset)   # using test_stationarity function created by me that includes adfuller function and rolloing mean analysis

## Estimating trend
indexedDataset_logScale=np.log(indexedDataset)    # taken log in index datasets

# taking the difference of moving an average and actual number of 'X', taking the log

movingAverage = indexedDataset_logScale.rolling(window=12).mean()    # 12 for monthly
movingSTD = indexedDataset_logScale.rolling(window=12).std()

#Differencing
datasetLogScaleMinussMovingAverage=indexedDataset_logScale-movingAverage
# removing NAN values
datasetLogScaleMinussMovingAverage.dropna(inplace=True)
datasetLogScaleMinussMovingAverage.head(12)

test_stationarity(datasetLogScaleMinussMovingAverage) 

当我运行test_stationarity函数时,做了所有这些之后,我得到了this 3,这表明我的滚动平均值和std不是常数,因此数据仍然是平稳的。因此,编写以下代码使数据保持平稳

    exponentialDecayWeightAverage=indexedDataset_logScale.ewm(halflife=365,min_periods=0,adjust=True).mean()
datasetLogScaleMinussMovingExponentialDecayAverage = indexedDataset_logScale-exponentialDecayWeightAverage
datasetLogScaleMinussMovingExponentialDecayAverage.dropna()   

# shifting the value into time series so that we can used it for forecasting

datasetLogDiffShifting=indexedDataset_logScale-indexedDataset_logScale.shift()   # d=2

datasetLogDiffShifting.dropna(inplace=True)

test_stationarity(datasetLogDiffShifting)         

这导致了fig 4。这再次表明滚动平均值和标准差不是常数,因此不是平稳的。有人能帮我吗,1)用月平均值代替所有数据的天气是合适的还是不合适的?2) 如何使我的数据固定


Tags: andthe数据代码testlogdatemean