如何以最优雅的方式在Python中评估体育比赛估计值?

2024-04-29 02:35:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我想评估对体育比赛的估计——在我的例子中是足球(即足球)比赛。我想用Python来做这个。在

基本上,总有team_home结果,team_away结果,estimate_home和{}。例如,一个游戏结束了1:0,估计值是0:0-这将返回wrong。在

只有四种可能的情况和结果:

  1. ^{cd7>如上所述
  2. tendency对胜利者的估计是正确的,但不是目标差(例如3:0
  3. goal difference为正确的目标差异,例如2:1
  4. right得到精确的右估计

在Python中处理估计和结果的最优雅的方法是什么?在


Tags: 游戏目标home情况估计值team例子足球
3条回答

这是一个更紧凑、更对称的函数。这就是你所说的“优雅”吗?在

def evaluate(team_home, team_away, estimate_home, estimate_away):
    if (team_home == estimate_home) and (team_away == estimate_away):
        return 'right'
    if (team_home - team_away) == (estimate_home - estimate_away):
        return 'goal difference'
    if ((team_home > team_away) == (estimate_home > estimate_away)) and \
       (team_home != team_away) and (estimate_home != estimate_away):
        return 'tendency'
    return 'wrong'

首先,我建议你考虑一下你会有什么样的问题?i、 e

  • 你想报告每个球员的估计值和实际值的列表吗?在
  • 你想给玩家排名吗?在
  • 你想做更多的统计工作吗?(玩家x在评估y队参与的比赛时更擅长)

我想你至少想做前两个!在

我试图使代码可读/简单,但在许多方面它比其他答案复杂得多,但它也为您提供了一个完整的工具箱,您可以使用它来处理大量数据,非常快速。所以把它当作另一种选择:)

基本上,如果你想的话,你也可以在将来做更多的统计工作。但实际上,这些问题确实会影响你问题的答案(或者更确切地说:最适合这里的答案)。在

我假设您有一个数据库(relational/mongodb/whatever),我在这里通过添加列表来伪装它。尽管我在这里使用的是pandas,但是这里描述的大多数事情也可以在关系数据库中以非常简单的方式完成。但是熊猫是岩石;)所以这也会很好的工作。如果你用excel或csv文件和朋友做一些事情,你也可以直接使用pandas read_csv或read_xls导入这些文件

import pandas as pd

# game is a unique id (like a combination of date, home_team and away_team)
bet_list = [
    {'playerid': 1, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
    {'playerid': 2, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
    {'playerid': 3, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
    {'playerid': 4, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},
    {'playerid': 1, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
    {'playerid': 2, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
    {'playerid': 3, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
    {'playerid': 4, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},   
    {'playerid': 1, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
    {'playerid': 2, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
    {'playerid': 3, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
    {'playerid': 4, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0}  
]

result_list = [
    {'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 4},
    {'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 2},
    {'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},
]

def calculate_result(input_df):
    input_df['result'] = 0
    # home wins (result 1)
    mask = input_df['home_goals'] > input_df['away_goals']
    input_df['result'][mask] = 1
    # away wins (result 2)
    mask = input_df['home_goals'] < input_df['away_goals']
    input_df['result'][mask] = 2
    # draws (result 3)
    mask = input_df['home_goals'] == input_df['away_goals']
    input_df['result'][mask] = 3
    # goal difference
    input_df['goal_difference'] = input_df['home_goals'] - input_df['away_goals']
    return input_df

# so what where the expectations?
bet_df = pd.DataFrame(bet_list)
bet_df = calculate_result(bet_df)
# if you want to look at the results
bet_df

# what were the actuals
result_df = pd.DataFrame(result_list)
result_df = calculate_result(result_df)
# if you want to look at the results
result_df

# now let's compare them!
# i take a subsetof the result df and link results on the game
combi_df = pd.merge(left=bet_df, right=result_df[['game', 'home_goals', 'away_goals', 'result', 'goal_difference']], left_on='game', right_on='game', how='inner', suffixes=['_bet', '_actual'])
# look at the data
combi_df

def calculate_bet_score(input_df):
    '''
Notice that I'm keeping in extra columns, because those are nice for comparative analytics in the future. Think: "you had this right, just like x% of all the people"

    '''
    input_df['bet_score'] = 0
    # now look at where people have correctly predicted the result
    input_df['result_estimation'] = 0
    mask = input_df['result_bet'] == input_df['result_actual']
    input_df['result_estimation'][mask] = 1 # correct result
    input_df['bet_score'][mask] = 1 # bet score for a correct result
    # now look at where people have correctly predicted the difference in goals when they already predicted the result correctly
    input_df['goal_difference_estimation'] = 0
    bet_mask = input_df['bet_score'] == 1
    score_mask = input_df['goal_difference_bet'] == input_df['goal_difference_actual']
    input_df['goal_difference_estimation'][(bet_mask) & (score_mask)] = 1 # correct result
    input_df['bet_score'][(bet_mask) & (score_mask)] = 2 # bet score for a correct result
    # now look at where people have correctly predicted the exact goals
    input_df['goal_exact_estimation'] = 0
    bet_mask = input_df['bet_score'] == 2
    home_mask = input_df['home_goals_bet'] == input_df['home_goals_actual']
    away_mask = input_df['away_goals_bet'] == input_df['away_goals_actual']
    input_df['goal_exact_estimation'][(bet_mask) & (home_mask) & (away_mask)] = 1 # correct result
    input_df['bet_score'][(bet_mask)  & (home_mask) & (away_mask)] = 3 # bet score for a correct result
    return input_df

combi_df = calculate_bet_score(combi_df)

# now look at the results
combi_df

# and you can do nifty stuff like making a top player list like this:
combi_df.groupby('playerid')['bet_score'].sum().order(ascending=False)
# player 4 is way ahead!
# which game was the best estimated game?
combi_df.groupby('game')['bet_score'].mean().order(ascending=False)
# game 3! though abysmal predictions in general ;) 

正如我所说,这主要是为了给Python中数据操作的可能性提供一个不同的观点/想法。一旦你开始认真对待大量的数据,这种(基于向量/数字/熊猫的)方法将是最快的,但你必须扪心自问,你想在数据库内部和外部执行什么逻辑,等等

希望这有帮助!在

另一个答案,反映了我对优雅的看法(我同意,这是一个相当主观的参数)。我希望我的对象由类定义,在构建时考虑到OOP,并使用管理对象之间关系的ORM。这带来了许多优点和更清晰的代码。在

我在这里使用的是pony ORM,但是还有很多其他优秀的选项(最终会有更多的许可证),比如SQLAlchemy或{a5}。在

下面是一个完整的示例—首先我们定义模型:

from pony.orm import *

class Player(db.Entity):
    """A player is somebody who place a bet, identified by its name."""
    name = Required(unicode)
    score = Required(int, default=0)
    bets = Set('Bet', reverse='player')
    # any other player's info can be stored here


class Match(db.Entity):
    """A Match is a game, played or not yet played."""

    ended = Required(bool, default=False)
    home_score = Required(int, default=0)
    visitors_score = Required(int, default=0)

    bets = Set('Bet', reverse='match')


class Bet(db.Entity):
    """A class that stores a bet for a specific game"""

    match = Required(Match, reverse="bets")
    home_score = Required(int, default=0)
    visitors_score = Required(int, default=0)
    player = Required(Player, reverse="bets")

@db_session
def calculate_wins(match):
    bets = select(b for b in Bet if b.match == match)[:]
    for bet in bets:
        if (match.home_score == bet.home_score) and (match.visitors_score == bet.visitors_score):
            bet.player.score += 3  # exact
        elif (match.home_score - match.visitors_score) == (bet.home_score - bet.visitors_score):
            bet.player.score += 2  # goal differences
        elif ((match.home_score > match.visitors_score) == (bet.home_score > bet.visitors_score)) and \
           (match.home_score != match.visitors_score) and (bet.home_score != bet.visitors_score):
            bet.player.score += 1  # tendency
        else:
            bet.player.score += 0  # wrong

使用这些类,您可以创建和更新您的比赛,球员,赌注数据库。 如果需要统计和数据聚合/排序,可以根据需要查询数据库。在

^{pr2}$

如果你愿意的话,你最终甚至可以使用numpy整合非常复杂的时间序列数据分析,就像Carst建议的那样,但是我相信这些添加的内容——尽管非常有趣——对你最初的问题来说有点不太好。在

相关问题 更多 >