在多个列上使用GroupBy创建新的滚动平均值列

import pandas as pd df = pd.DataFrame({ 'date':['2016-04-01','2016-05-01','2016-07-01','2016-08-01','2016-09-01', '2019-04-01','2019-05-01','2019-06-01','2019-08-01','2019-09-01'], 'Country':['USA', 'USA', 'USA', 'USA', 'USA','USA', 'USA', 'USA', 'USA', 'USA'], 'Region':['Eastern','Eastern','Eastern','Eastern','Eastern','Eastern','Eastern','Eastern','Eastern','Eastern'], 'State':['New York','New York','New York','New York','New York','New York','New York','New York','New York','New York'], 'Supplier':['ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC'], 'Location':['Bin-1', 'Bin-1', 'Bin-1', 'Bin-1', 'Bin-1','Bin-1', 'Bin-1', 'Bin-1', 'Bin-1', 'Bin-1'], 'Year':[2016,2016,2016,2016,2016,2019,2019,2019,2019,2019], 'Month':[4,5,7,8,9,4,5,6,8,9], 'periodcode':[4,5,7,8,9,4,5,6,8,9], 'Product':['bike','bike','bike','bike','bike','bike','bike','bike','bike','bike'], 'total':[0,2000,1000,4000,0,2000,2000,1000,4000,600]}) df.set_index('date', inplace=True) df['mean'] = df.groupby(['Country','Region','State','Supplier','Location','Product'], as_index=False)['total'].rolling(3).mean().reset_index(level=0,drop=True) df.head(10)

1条回答

网友

1楼 · 发布于 2024-05-18 07:12:41

由于根据我们在下面评论中的讨论，您希望计算各年内每组的滚动平均数，因此以下内容应能为您提供所需的结果：

df['mean'] = df.groupby(['Country','Region','State','Supplier','Location','Product'])['total'].rolling(3).mean().reset_index().set_index("date")['total']

关键是保持date索引（它允许您将计算出的滚动平均值与原始数据帧中的一行相匹配），并提取Series列上滚动平均值计算返回的total对象

更详细的解释：

您的问题是没有Year的groupby会导致DataFrame与df不兼容，因此无法分配给df["mean"]

第一个变量给出了Series开关匹配索引：

df.groupby(['Country','Region','State','Supplier','Location','Product','Year'], as_index=False)['total'].rolling(3).mean().reset_index(level=0,drop=True)

date
2016-04-01            NaN
2016-05-01            NaN
2016-07-01    1000.000000
2016-08-01    2333.333333
2016-09-01    1666.666667
2019-04-01            NaN
2019-05-01            NaN
2019-06-01    1666.666667
2019-08-01    2333.333333
2019-09-01    1866.666667
Name: total, dtype: float64

但是，第二个变量（不带Year）导致DataFrame，其中date列中的每个条目都成为自己的列。因此，您不能将其分配给df["mean"]

这个问题的解决方案实际上取决于您试图解决的问题。但是，从概念上讲，如果将date作为索引，则在分配给df["mean"]的Series中，每个date只能有一个值

相关问题更多 >

编程相关推荐

热门问题

热门文章