所以我可以自己解决这个问题,但我觉得我这样做的效率非常低。我希望有人能提供一个替代方案,因为这不是理想的方法。你知道吗
自从2009赛季以来,我有每一场NFL比赛的数据。这个数据集包含一个游戏日期列,但不包含一个赛季列,所以我想创建一个。有时NFL在一月份有比赛,所以我不能简单地根据一年来计算。你知道吗
下面是我想出的一个非常低效的解决方案:
# Create list of season years
season_years = [2009,2010,2011,2012,2013,2014,2015,2016,2017,2018]
# Initialize dictionary of seasons
seasons = {}
# Iterate over season years to add start and end dates to seasons dictionary
# Used Mar 1 and Feb 28 as start and end dates due to Super Bowl being played in early Feb every year
for year in season_years:
seasons[year] = {'start': str(year) + '-03-01','end': str(year + 1) + '-02-28'}
# Turn seasons dictionary into dataframe
seasons_df = pd.DataFrame(seasons).transpose()
# Convert start and end dates in dataframe to datetime objects
seasons_df['start'] = pd.to_datetime(seasons_df['start'])
seasons_df['end'] = pd.to_datetime(seasons_df['end'])
# Initialize new column 'season' with None values
data['season'] = None
# Iterate over season years, add year to season column if game date is between start and end for that season
for year in season_years:
data.loc[pd.to_datetime(data['game_date']).between(seasons_df.loc[year,'start'],seasons_df.loc[year,'end']),'season'] = year
所以这是可行的,但是为了创建新的列,我必须迭代Python列表,这有点粗糙。一定有更好的办法。你知道吗
编辑:可以从kaggle下载数据:https://www.kaggle.com/maxhorowitz/nflplaybyplay2009to2016/version/6?
您可以使用^{} 为季节生成边界,然后使用^{} 将每个游戏日期指定给一个季节:
其中
bins
如下所示:一组随机游戏日期的结果:
相关问题 更多 >
编程相关推荐