Pandas按月重采样2M和3M

1 投票
1 回答
43 浏览
提问于 2025-04-14 18:32

我有一段代码用来计算收益:

import yfinance as yf
import numpy as np
import pandas as pd

df = yf.download('SPY', '2023-01-01')
df = df[['Close']]
df['d_returns'] = np.log(df.div(df.shift(1)))
df.dropna(inplace = True)

df_1M = pd.DataFrame()
df_2M = pd.DataFrame()
df_3M = pd.DataFrame()

df_1M['1M cummreturns'] = df.d_returns.cumsum().apply(np.exp)
df_2M['2M cummreturns']= df.d_returns.cumsum().apply(np.exp)
df_3M['3M cummreturns'] = df.d_returns.cumsum().apply(np.exp)

df1 = df_1M[['1M cummreturns']].resample('1M').max()
df2 = df_2M[['2M cummreturns']].resample('2M').max()
df3 = df_3M[['3M cummreturns']].resample('3M').max()

df1 = pd.concat([df1, df2, df3], axis=1)
df1

这段代码的输出结果是:

            1M cummreturns  2M cummreturns  3M cummreturns
Date            
2023-01-31  1.067381        1.067381        1.067381
2023-02-28  1.094428        NaN             NaN
2023-03-31  1.075022        1.094428        NaN
2023-04-30  1.092196        NaN             1.094428
2023-05-31  1.103356        1.103356        NaN
2023-06-30  1.164014        NaN             NaN
2023-07-31  1.202116        1.202116        1.202116
2023-08-31  1.198677        NaN             NaN
2023-09-30  1.184785        1.198677        NaN
2023-10-31  1.145738        NaN             1.198677
2023-11-30  1.198466        1.198466        NaN
2023-12-31  1.251746        NaN             NaN
2024-01-31  1.290032        1.290032        1.290032
2024-02-29  1.334174        NaN             NaN
2024-03-31  1.346699        1.346699        NaN
2024-04-30  NaN             NaN             1.346699

我该如何为每一行的 2M cummreturns3M cummreturns 列获取有效的值呢?

举个例子,2023-02-28 这一行代表的是 2023年2月。那么 2M cummreturns3M cummreturns 列需要显示从 2023年2月 开始的接下来2个月和3个月内的最大收益,和 1M cummreturns 列显示的接下来1个月的最大收益是一样的。

1 个回答

2

如果我理解得没错,你只想对第一列进行重新采样,以便得到月份。然后计算一个 rolling.max

# compute once the max per month
tmp = df.d_returns.cumsum().apply(np.exp).resample('1M').max()

# then roll over the above with different windows
N = 3 # number of columns to generate
out = pd.DataFrame({f'{x+1}M cummreturns': tmp.rolling(x+1, min_periods=1).max()
                    for x in range(N)})

输出结果:

            1M cummreturns  2M cummreturns  3M cummreturns
Date                                                      
2023-01-31        1.067381        1.067381        1.067381
2023-02-28        1.094428        1.094428        1.094428
2023-03-31        1.075022        1.094428        1.094428
2023-04-30        1.092196        1.092196        1.094428
2023-05-31        1.103356        1.103356        1.103356
2023-06-30        1.164014        1.164014        1.164014
2023-07-31        1.202116        1.202116        1.202116
2023-08-31        1.198677        1.202116        1.202116
2023-09-30        1.184785        1.198677        1.202116
2023-10-31        1.145738        1.184785        1.198677
2023-11-30        1.198466        1.198466        1.198466
2023-12-31        1.251746        1.251746        1.251746
2024-01-31        1.290032        1.290032        1.290032
2024-02-29        1.334174        1.334174        1.334174
2024-03-31        1.346699        1.346699        1.346699

撰写回答