import pandas as pd
import numpy as np
np.random.seed(2)
d = {'match_id': np.arange(10)
,'team_id': ['City','City','City','Utd','Utd','Utd','Albion','Albion','Albion','Albion']
,'goals_scored': np.random.randint(0,5,10)
,'time_played': [0,1,2,0,1,2,0,1,2,3]}
df = pd.DataFrame(data=d)
#previous n matches
n=2
#some Saturday 3pm kickoffs.
rng = pd.date_range('2017-12-02 15:00:00','2017-12-25 15:00:00',freq='W')
# change the time_played integers to the datetimes
df['time_played'] = df['time_played'].map(lambda x: rng[x])
#be sure the sort order is correct
df = df.sort_values(['team_id','time_played'])
# a rolling sum() and then shift(1) to align value with row as per question
df['total_goals'] = df.groupby(['team_id'])['goals_scored'].apply(lambda x: x.rolling(n).sum())
df['total_goals'] = df.groupby(['team_id'])['total_goals'].shift(1)
产生:
goals_scored match_id team_id time_played total_goals->(in previous n)
6 2 6 Albion 2017-12-03 15:00:00 NaN
7 1 7 Albion 2017-12-10 15:00:00 NaN
8 3 8 Albion 2017-12-17 15:00:00 3.0
9 2 9 Albion 2017-12-24 15:00:00 4.0
0 0 0 City 2017-12-03 15:00:00 NaN
1 0 1 City 2017-12-10 15:00:00 NaN
2 3 2 City 2017-12-17 15:00:00 0.0
3 2 3 Utd 2017-12-03 15:00:00 NaN
4 3 4 Utd 2017-12-10 15:00:00 NaN
5 0 5 Utd 2017-12-17 15:00:00 5.0
我编了一些模拟数据,因为我喜欢足球,但就像jacobh建议的那样,最好总是提供一个带有问题的示例数据框架。你知道吗
产生:
可能有一种更有效的方法可以使用聚合函数来实现这一点,但是这里有一个解决方案,对于每个条目,您都要过滤整个数据帧,以隔离团队和日期范围,然后对目标求和。你知道吗
相关问题 更多 >
编程相关推荐