Pandas对同一datafram中datetime列的最后n个值使用列上的聚合函数

2024-06-16 14:40:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个带有Sportsboting数据的数据框,包含:match\u id、team\u id、goals\u scored和比赛开始时间的datetime列。我想在这个数据框中添加一列,每一行显示每个球队在前n场比赛中得分的总和。你知道吗


Tags: 数据iddatetimematch时间teamgoals总和
2条回答

我编了一些模拟数据,因为我喜欢足球,但就像jacobh建议的那样,最好总是提供一个带有问题的示例数据框架。你知道吗

import pandas as pd
import numpy as np
np.random.seed(2)

d = {'match_id': np.arange(10)
        ,'team_id': ['City','City','City','Utd','Utd','Utd','Albion','Albion','Albion','Albion']
        ,'goals_scored': np.random.randint(0,5,10)
        ,'time_played': [0,1,2,0,1,2,0,1,2,3]}
df = pd.DataFrame(data=d)

#previous n matches
n=2

#some Saturday 3pm kickoffs.
rng = pd.date_range('2017-12-02 15:00:00','2017-12-25 15:00:00',freq='W')

# change the time_played integers to the datetimes
df['time_played'] = df['time_played'].map(lambda x: rng[x])

#be sure the sort order is correct
df = df.sort_values(['team_id','time_played'])

# a rolling sum() and then shift(1) to align value with row as per question
df['total_goals'] = df.groupby(['team_id'])['goals_scored'].apply(lambda x: x.rolling(n).sum())
df['total_goals'] = df.groupby(['team_id'])['total_goals'].shift(1)

产生:

   goals_scored  match_id team_id         time_played  total_goals->(in previous n)
6             2         6  Albion 2017-12-03 15:00:00          NaN
7             1         7  Albion 2017-12-10 15:00:00          NaN
8             3         8  Albion 2017-12-17 15:00:00          3.0
9             2         9  Albion 2017-12-24 15:00:00          4.0
0             0         0    City 2017-12-03 15:00:00          NaN
1             0         1    City 2017-12-10 15:00:00          NaN
2             3         2    City 2017-12-17 15:00:00          0.0
3             2         3     Utd 2017-12-03 15:00:00          NaN
4             3         4     Utd 2017-12-10 15:00:00          NaN
5             0         5     Utd 2017-12-17 15:00:00          5.0

可能有一种更有效的方法可以使用聚合函数来实现这一点,但是这里有一个解决方案,对于每个条目,您都要过滤整个数据帧,以隔离团队和日期范围,然后对目标求和。你知道吗

df['goals_to_date'] = df.apply(lambda row: np.sum(df[(df['team_id'] == row['team_id'])\
    &(df['datetime'] < row['datetime'])]['goals_scored']), axis = 1)

相关问题 更多 >