<p>首先,我建议你考虑一下你会有什么样的问题?i、 e</p>
<ul>
<li>你想报告每个球员的估计值和实际值的列表吗?在</li>
<li>你想给玩家排名吗?在</li>
<li>你想做更多的统计工作吗?(玩家x在评估y队参与的比赛时更擅长)</li>
</ul>
<p>我想你至少想做前两个!在</p>
<p>我试图使代码可读/简单,但在许多方面它比其他答案复杂得多,但它也为您提供了一个完整的工具箱,您可以使用它来处理大量数据,非常快速。所以把它当作另一种选择:)</p>
<p>基本上,如果你想的话,你也可以在将来做更多的统计工作。但实际上,这些问题确实会影响你问题的答案(或者更确切地说:最适合这里的答案)。在</p>
<p>我假设您有一个数据库(relational/mongodb/whatever),我在这里通过添加列表来伪装它。尽管我在这里使用的是pandas,但是这里描述的大多数事情也可以在关系数据库中以非常简单的方式完成。但是熊猫是岩石;)所以这也会很好的工作。如果你用excel或csv文件和朋友做一些事情,你也可以直接使用pandas read_csv或read_xls导入这些文件</p>
<pre><code>import pandas as pd
# game is a unique id (like a combination of date, home_team and away_team)
bet_list = [
{'playerid': 1, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
{'playerid': 2, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
{'playerid': 3, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
{'playerid': 4, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},
{'playerid': 1, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
{'playerid': 2, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
{'playerid': 3, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
{'playerid': 4, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},
{'playerid': 1, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
{'playerid': 2, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
{'playerid': 3, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
{'playerid': 4, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0}
]
result_list = [
{'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 4},
{'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 2},
{'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},
]
def calculate_result(input_df):
input_df['result'] = 0
# home wins (result 1)
mask = input_df['home_goals'] > input_df['away_goals']
input_df['result'][mask] = 1
# away wins (result 2)
mask = input_df['home_goals'] < input_df['away_goals']
input_df['result'][mask] = 2
# draws (result 3)
mask = input_df['home_goals'] == input_df['away_goals']
input_df['result'][mask] = 3
# goal difference
input_df['goal_difference'] = input_df['home_goals'] - input_df['away_goals']
return input_df
# so what where the expectations?
bet_df = pd.DataFrame(bet_list)
bet_df = calculate_result(bet_df)
# if you want to look at the results
bet_df
# what were the actuals
result_df = pd.DataFrame(result_list)
result_df = calculate_result(result_df)
# if you want to look at the results
result_df
# now let's compare them!
# i take a subsetof the result df and link results on the game
combi_df = pd.merge(left=bet_df, right=result_df[['game', 'home_goals', 'away_goals', 'result', 'goal_difference']], left_on='game', right_on='game', how='inner', suffixes=['_bet', '_actual'])
# look at the data
combi_df
def calculate_bet_score(input_df):
'''
Notice that I'm keeping in extra columns, because those are nice for comparative analytics in the future. Think: "you had this right, just like x% of all the people"
'''
input_df['bet_score'] = 0
# now look at where people have correctly predicted the result
input_df['result_estimation'] = 0
mask = input_df['result_bet'] == input_df['result_actual']
input_df['result_estimation'][mask] = 1 # correct result
input_df['bet_score'][mask] = 1 # bet score for a correct result
# now look at where people have correctly predicted the difference in goals when they already predicted the result correctly
input_df['goal_difference_estimation'] = 0
bet_mask = input_df['bet_score'] == 1
score_mask = input_df['goal_difference_bet'] == input_df['goal_difference_actual']
input_df['goal_difference_estimation'][(bet_mask) & (score_mask)] = 1 # correct result
input_df['bet_score'][(bet_mask) & (score_mask)] = 2 # bet score for a correct result
# now look at where people have correctly predicted the exact goals
input_df['goal_exact_estimation'] = 0
bet_mask = input_df['bet_score'] == 2
home_mask = input_df['home_goals_bet'] == input_df['home_goals_actual']
away_mask = input_df['away_goals_bet'] == input_df['away_goals_actual']
input_df['goal_exact_estimation'][(bet_mask) & (home_mask) & (away_mask)] = 1 # correct result
input_df['bet_score'][(bet_mask) & (home_mask) & (away_mask)] = 3 # bet score for a correct result
return input_df
combi_df = calculate_bet_score(combi_df)
# now look at the results
combi_df
# and you can do nifty stuff like making a top player list like this:
combi_df.groupby('playerid')['bet_score'].sum().order(ascending=False)
# player 4 is way ahead!
# which game was the best estimated game?
combi_df.groupby('game')['bet_score'].mean().order(ascending=False)
# game 3! though abysmal predictions in general ;)
</code></pre>
<p>正如我所说,这主要是为了给Python中数据操作的可能性提供一个不同的观点/想法。一旦你开始认真对待大量的数据,这种(基于向量/数字/熊猫的)方法将是最快的,但你必须扪心自问,你想在数据库内部和外部执行什么逻辑,等等</p>
<p>希望这有帮助!在</p>