Python：结合groupby计算MultiIndex DataFrame中每一列的百分比变化

# Creat pd.MultiIndex and include some NaNs rng = pd.date_range(start='2018-12-20', periods=20, name='date') date = np.concatenate([rng, rng]) perm = np.array([*np.repeat(1, 20), *np.repeat(2, 20)]) d = {'perm': perm, 'date': date, 'ser_1': np.random.randint(low=1, high=10, size=[40]), 'ser_2': np.random.randint(low=1, high=10, size=[40])} df = pd.DataFrame(data=d) df.iloc[5:8, 2:] = np.nan df.iloc[11:13, 2] = np.nan df.iloc[25:28, 2:] = np.nan df.iloc[33:37, 3] = np.nan df.set_index(['perm', 'date'], drop=True, inplace=True) # Apply pd.pct_change to every column individually in order to take care of the # NaNs at different positions. Also, use groupby for every 'perm'. This one is # where I am struggling. # This is working properly, but it doesn't take into account 'perm'. The first # two rows of perm=2 (i.e. rows 20 and 21) must be NaN. chg = df.apply(lambda x, periods: x.dropna().pct_change(periods=2). reindex(df.index, method='ffill'), axis=0, periods=2) # This one is causing an error: # TypeError: () got an unexpected keyword argument 'axis' chg = df.groupby('perm').apply(lambda x, periods: x.dropna().pct_change(periods=2). reindex(df.index, method='ffill'), axis=0, periods=2)

1条回答

网友

1楼 · 发布于 2024-04-18 15:55:21

“意外的关键字参数'axis'”错误来自这样一个事实：pandas.DataFrame.apply和pandas.core.groupby.GroupBy.apply是两个不同的方法，具有相似但不同的参数：它们具有相同的名称，因为它们旨在执行非常相似的任务，但它们属于两个不同的类。
如果您查看文档，您将看到第一个文档需要一个axis参数。第二个不是。你知道吗

因此，要使代码与groupby一起工作，只需从GroupBy.apply中删除axis参数。由于dropna的原因，您希望逐列工作，因此需要在GroupBy.apply内使用DataFrame.apply：

chg = df.groupby('perm').apply(lambda x:
                           x.apply(lambda y : y.dropna().pct_change(periods=2)
                           .reindex(x.index, method='ffill'),
                           axis=0))

这会产生您想要的结果（前两行“perm 2”是NaN，其他数字等于使用apply而不使用groupby得到的结果）。
注意，我还编辑了reindex中的第一个参数：是x.index而不是df.index，否则在最终结果中会得到一个双perm索引。你知道吗

最后一点注意，如果在pc_change中硬编码设置lambda函数，则无需将period参数传递给该函数。是多余的。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章