Pandas数据帧矢量化/筛选：ValueError:只能比较带相同标签的序列对象

>>> df_all season gameId playerTeam opposingTeam gameDate xGoalsFor xGoalsAgainst 1 2008 2008020001 NYR T.B 20081004 2.287 2.689 6 2008 2008020003 NYR T.B 20081005 1.793 0.916 11 2008 2008020010 NYR CHI 20081010 1.938 2.762 16 2008 2008020019 NYR PHI 20081011 3.030 3.020 21 2008 2008020034 NYR N.J 20081013 1.562 3.454 ... ... ... ... ... ... ... ... 142576 2015 2015030185 L.A S.J 20160422 2.927 2.042 142581 2017 2017030171 L.A VGK 20180411 1.275 2.279 142586 2017 2017030172 L.A VGK 20180413 1.907 4.642 142591 2017 2017030173 L.A VGK 20180415 2.452 3.159 142596 2017 2017030174 L.A VGK 20180417 2.427 1.818

>>> df_sum_all season team xg5 xg10 xg15 xg20 0 2008 NYR 0 0 0 0 1 2009 NYR 0 0 0 0 2 2010 NYR 0 0 0 0 3 2011 NYR 0 0 0 0 4 2012 NYR 0 0 0 0 .. ... ... ... ... ... ... 327 2014 L.A 0 0 0 0 328 2015 L.A 0 0 0 0 329 2016 L.A 0 0 0 0 330 2017 L.A 0 0 0 0 331 2018 L.A 0 0 0 0

def calcRatio(statfor, statagainst, games, season, team, statsdf): tempFor = float(statsdf[(statsdf.playerTeam == team) & (statsdf.season == season)].nsmallest(games, 'gameDate').eval(statfor).sum()) tempAgainst = float(statsdf[(statsdf.playerTeam == team) & (statsdf.season == season)].nsmallest(games, 'gameDate').eval(statagainst).sum()) tempRatio = tempFor / tempAgainst return tempRatio

>>> statsdf = df_all >>> team = 'TOR' >>> season = 2015 >>> games = 3 >>> tempFor = float(statsdf[(statsdf.playerTeam == team) & (statsdf.season == season)].nsmallest(games, 'gameDate').eval(statfor).sum()) >>> print(tempFor) 8.618

>>> df_sum_all['xg5'] = calcRatio('xGoalsFor','xGoalsAgainst',5,df_sum_all['season'], df_sum_all['team'], df_all) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 2, in calcRatio File "/home/sebastian/.local/lib/python3.6/site-packages/pandas/core/ops/__init__.py", line 1142, in wrapper raise ValueError("Can only compare identically-labeled " "Series objects") ValueError: Can only compare identically-labeled Series objects

>>> emptyseries = [] >>> for index, row in df_sum_all.iterrows(): ... emptyseries.append(calcRatio('xGoalsFor','xGoalsAgainst',5,row['season'],row['team'], df_all)) ... >>> df_sum_all['xg5'] = emptyseries __main__:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy >>> df_sum_all season team xg5 xg10 xg15 xg20 0 2008 NYR 0.826260 0 0 0 1 2009 NYR 1.288390 0 0 0 2 2010 NYR 0.915942 0 0 0 3 2011 NYR 0.730498 0 0 0 4 2012 NYR 0.980744 0 0 0 .. ... ... ... ... ... ... 327 2014 L.A 0.823998 0 0 0 328 2015 L.A 1.147412 0 0 0 329 2016 L.A 1.054947 0 0 0 330 2017 L.A 1.369005 0 0 0 331 2018 L.A 0.721411 0 0 0 [332 rows x 6 columns]

1条回答

网友

1楼 · 发布于 2024-06-08 18:06:59

“ValueError:只能比较标记相同的系列对象”

tempFor = float(statsdf[(statsdf.playerTeam == team) & (statsdf.season == season)].nsmallest(games, 'gameDate').eval(statfor).sum())
tempAgainst = float(statsdf[(statsdf.playerTeam == team) & (statsdf.season == season)].nsmallest(games, 'gameDate').eval(statagainst).sum())

变量的输入：

team: df_sum_all['team']
season: df_sum_all['season']
statsdf: df_all

所以在代码中，(statsdf.playerTeam公司==team），它将比较df\u sum\u all和df\u all中的序列。如果这两个标签不相同，您将看到上述错误。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章