Pandas：去季节化时间序列数据

1 投票

1 回答

947 浏览

提问于 2025-04-18 11:29

我有一个数据表 df，里面记录了在22个不连续的日子里，每10秒钟的交易量。

[输出]:

                     VOL
2011-04-01 09:30:00  11297
2011-04-01 09:30:10  6526
2011-04-01 09:30:20  14021
2011-04-01 09:30:30  19472
2011-04-01 09:30:40  7602
...
2011-04-29 15:59:30  79855
2011-04-29 15:59:40  83050
2011-04-29 15:59:50  602014

这个 df 包含了每10秒钟的交易量数据，我想要去掉这些数据的季节性影响。具体来说，我想把每个观察值除以它们各自5分钟时间段的平均交易量。为了做到这一点，我需要计算在这22天里，每5分钟的交易量平均值。这样，我就能得到一个每5分钟的平均值时间序列，比如 9:30:00 - 9:35:00; 9:35:00 - 9:40:00; 9:40:00 - 9:45:00 ... 一直到16:00:00。比如说，9:30:00 - 9:35:00 这个时间段的平均值，就是在这22天里所有这段时间的交易量的平均数（也就是说，9:30:00 到 9:35:00 的总交易量，除以22天的数量）。这样理解对吗？接着，我会把 df 中在 9:30:00 - 9:35:00 这个时间段内的每个观察值，除以这个时间段的平均值。

请问在Python/Pandas中，有没有什么工具可以做到这一点？

时间序列分析数据预处理数据分组平均值计算时间窗口数据标准化去季节化交易量分析

1 个回答

编辑后的回答：

date_times = pd.date_range(datetime.datetime(2011, 4, 1, 9, 30),
                           datetime.datetime(2011, 4, 16, 0, 0),
                           freq='10s')
VOL = np.random.sample(date_times.size) * 10000.0

df = pd.DataFrame(data={'VOL': VOL,'time':date_times}, index=date_times)
df['h'] = df.index.hour
df['m'] = df.index.minute
df1 = df.resample('5Min', how={'VOL': np.mean})
times = pd.to_datetime(df1.index)
df2 = df1.groupby([times.hour,times.minute]).VOL.mean().reset_index()
df2.columns = ['h','m','VOL']
df.merge(df2,on=['h','m'])
df_norm = df.merge(df2,on=['h','m'])
df_norm['norm'] = df_norm['VOL_x']/df_norm['VOL_y']

** 旧的回答（暂时保留）

使用重采样函数

df.resample('5Min', how={'VOL': np.mean})

例如：

date_times = pd.date_range(datetime.datetime(2011, 4, 1, 9, 30),
                           datetime.datetime(2011, 4, 16, 0, 0),
                           freq='10s')
VOL = np.random.sample(date_times.size) * 10000.0

df = pd.DataFrame(data={'VOL': VOL}, index=date_times)
df.resample('5Min', how={'VOL': np.mean})

回答于 2025-04-18 由 Python大师

分享举报

Pandas：去季节化时间序列数据

1 个回答

撰写回答