我正在使用nba_py获取一些NBA比赛的记分牌数据。你知道吗
以下是数据结构示例:
SEASON | GAME_DATE_EST | GAME_SEQUENCE | GAME_ID | HOME_TEAM_ID | VISITOR_TEAM_ID | WINNER
0 2013 2013-10-05T00:00:00 1 11300001 12321 1610612760 V
1 2013 2013-10-05T00:00:00 2 11300002 1610612754 1610612741 V
2 2013 2013-10-05T00:00:00 3 11300003 1610612745 1610612740 V
3 2013 2013-10-05T00:00:00 4 11300004 1610612747 1610612744 H
4 2013 2013-10-06T00:00:00 1 11300005 12324 1610612755 V
您可以在这里找到部分数据:NBA Games Data。
我的目标是创建以下列并将其添加到原始数据中:
对于主队:
1. Total wins/losses for hometeam if hometeam plays at home ("HOMETEAM_HOME_WINS"/"HOMETEAM_HOME_LOSSES")
2. Total wins/losses for hometeam if hometeam is visiting ("HOMETEAM_VISITOR_WINS"/"HOMETEAM_VISITOR_LOSSES")
对于访客团队:
3. Total wins/losses for visitor_team if visitor_team plays at home ("VISITOR_TEAM_HOME_WINS"/"VISITOR_TEAM_HOME_LOSSES")
4. Total wins/losses for visitor_team if visitor_team is visiting ("VISITOR_TEAM_VISITOR_WINS"/"VISITOR_TEAM_VISITOR_LOSSES")
我的第一个简单方法如下:
def get_home_team_home_wins(x):
hometeam = x.HOME_TEAM_ID
season = x.SEASON
gid = x.name
season_hometeam_games = grouped_seasons_hometeams.get_group((season, hometeam))
home_games = season_hometeam_games[(season_hometeam_games.index < gid)]
if not home_games.empty:
try:
home_wins = home_games.FTR.value_counts()["H"]
except Exception as e:
home_wins = 0
else:
home_wins = 0
grouped_seasons_hometeams = df.groupby(["SEASON", "HOME_TEAM_ID"])
df["HOMETEAM_HOME_WINS"] = df.apply(lambda x: get_home_team_home_wins(x), axis=1)
另一种方法是iterating over the rows:
grouped_seasons = df.groupby("SEASON")
df["HOMETEAM_HOME_WINS"] = 0
current_season = 0
for index,row in df.iterrows():
season = row.SEASON
if season != current_season:
current_season = season
season_games = grouped_seasons.get_group(current_season)
hometeam = row.HOME_TEAM_ID
gid = row.name
games = season_games[(season_games.index < gid)]
home_games = games[(games.HOME_TEAM_ID == hometeam)]
if not home_games.empty:
try:
home_wins = home_games.FTR.value_counts()["H"]
except Exception as e:
home_wins = 0
else:
home_wins = 0
row["HOME_TEAM_HOME_WINS_4"] = home_wins
df.ix[index] = row
更新1:
如果主队在主场比赛或是来访,下面有一些函数可以用来获取主队的胜负。一个类似的将是为访客团队。你知道吗
def get_home_team_home_wins_losses(x):
hometeam = x.HOME_TEAM_ID
season = x.SEASON
gid = x.name
games = df[(df.SEASON == season) & (df.index < gid)]
home_team_home_games = games[(games.HOME_TEAM_ID == hometeam)]
# HOMETEAM plays at home
if not home_team_home_games.empty:
home_team_home_games_value_counts = home_team_home_games.FTR.value_counts()
try:
home_team_home_wins = home_team_home_games_value_counts["H"]
except Exception as e:
home_team_home_wins = 0
try:
home_team_home_losses = home_team_home_games_value_counts["V"]
except Exception as e:
home_team_home_losses = 0
else:
home_team_home_wins = 0
home_team_home_losses = 0
return [home_team_home_wins, home_team_home_losses]
def get_home_team_visitor_wins_losses(x):
hometeam = x.HOME_TEAM_ID
season = x.SEASON
gid = x.name
games = df[(df.SEASON == season) & (df.index < gid)]
home_team_visitor_games = games[(games.VISITOR_TEAM_ID == hometeam)]
# HOMETEAM visits
if not home_team_visitor_games.empty:
home_team_visitor_games_value_counts = home_team_visitor_games.FTR.value_counts()
try:
home_team_visitor_wins = home_team_visitor_games_value_counts["V"]
except Exception as e:
home_team_visitor_wins = 0
try:
home_team_visitor_losses = home_team_visitor_games_value_counts["H"]
except Exception as e:
home_team_visitor_losses = 0
else:
home_team_visitor_wins = 0
home_team_visitor_losses = 0
return [home_team_visitor_wins, home_team_visitor_losses]
df["HOME_TEAM_HOME_WINS"], df["HOME_TEAM_HOME_LOSSES"] = zip(*df.apply(lambda x: get_home_team_home_wins_losses(x), axis=1))
df["HOME_TEAM_VISITOR_WINS"], df["HOME_TEAM_VISITOR_LOSSES"] = zip(*df.apply(lambda x: get_home_team_visitor_wins_losses(x), axis=1))
df["HOME_TEAM_WINS"] = df["HOME_TEAM_HOME_WINS"] + df["HOME_TEAM_VISITOR_WINS"]
df["HOME_TEAM_LOSSES"] = df["HOME_TEAM_HOME_LOSSES"] + df["HOME_TEAM_VISITOR_LOSSES"]
上述方法效率不高。所以,我正在考虑使用groupby,但不清楚如何使用。你知道吗
我会添加更新,每当我发现一些更有效的。你知道吗
有什么想法吗?谢谢。你知道吗
考虑使用
transform()
,但首先有条件地创建HOMEWINNER
和VISITWINNER
整数列。用numpy.where()
注释掉的等价if/else计算更容易阅读,您可能/可能没有作为一个包提供。你知道吗请注意
transform()
保留所有行,但将按ID聚合,因此特定HOME_TEAM_ID
的每个记录都应在这些聚合列中重复值现在,对于主队稍后访问的实例,反之亦然,请考虑将id与子集合的数据帧合并(如果需要,请更改列号)。这就抓住了主队,主队也是客队。因此,在
mergedf
上运行上述聚合(并使用此时间WINNER_x
和VISITWINNER
使用WINNER_y
计算相同的条件HOMEWINNER
):相关问题 更多 >
编程相关推荐