如何根据每组的大小设置滚动窗口的大小？

Value ID min max sum mean var ---------------------------------- 1 3.0 4.0 7.0 3.5 0.5 # the last 4/2 rows for group with ID =1 2 7.0 7.0 7.0 7.0 0.5 # the last 3/2 rows for group with ID =2 3 2.0 2.0 2.0 2.0 Nan # the last 1 rows for group with ID =3

df_group=df.groupby('ID') .apply(lambda x: x \ .sort_values(by=['ID']) .rolling(window=int(x.size/2),min_periods=1) .agg({'Value':['min','max','sum','mean','var']}) .tail(1) )

Value min max sum mean var ID ------------------------------------------------ 1 3 1.0 4.0 10.0 2.5 1.666667 2 6 6.0 8.0 21.0 7.0 1.000000 3 7 2.0 2.0 2.0 2.0 NaN

1条回答

网友

1楼 · 发布于 2024-05-19 19:29:16

一种可能的解决方案，包括：

import pandas as pd
df = pd.DataFrame(dict(ID=[1,1,1,1,2,2,2,3],
                      Value=[1,2,3,4,6,7,8,2]))

print(df)
##
   ID  Value
0   1      1
1   1      2
2   1      3
3   1      4
4   2      6
5   2      7
6   2      8
7   3      2

按如下所示循环分组

#Object to store the result
stats = []

#Group over ID
for ID, Values in df.groupby('ID'):
    # tail : to get last n values, with n max between 1 and group length / 2
    # describe : to get the statistics
    _stat = Values.tail(max(1,int(len(Values)/2)))['Value'].describe()
    #Add group ID to the result
    _stat.loc['ID'] = ID
    #Store the result
    stats.append(_stat)

#Create the new dataframe
pd.DataFrame(stats).set_index('ID')

结果

     count  mean       std  min   25%  50%   75%  max
ID                                                   
1.0    2.0   3.5  0.707107  3.0  3.25  3.5  3.75  4.0
2.0    1.0   8.0       NaN  8.0  8.00  8.0  8.00  8.0
3.0    1.0   2.0       NaN  2.0  2.00  2.0  2.00  2.0

链接：

相关问题更多 >

编程相关推荐

热门问题

热门文章