python:向量化定义仅在第一个条件下有效。后续循环不受影响

1 投票
1 回答
39 浏览
提问于 2025-04-14 15:22

我有一个向量化的函数:

def selection_update_weights(df):
    # Define the selections for 'Win'
    selections_win = ["W & O 2.5 (both untested)", "Win (untested) & O 2.5", "Win & O 2.5 (untested)", "W & O 2.5", 
                      "W & O 1.5 (both untested)", "Win (untested) & O 1.5", "Win & O 1.5 (untested)", "W & O 1.5", 
                      "W & U 4.5 (both untested)", "Win (untested) & U 4.5", "Win & U 4.5 (untested)", "W & U 4.5", 
                      "W (untested)", "W"]

    # Create a boolean mask for the condition for 'Win'
    mask_win = (df['selection_match'] == "no match") & \
               (df['selection'].isin(selections_win)) & \
               (df['result_match'] == "no match") & \
               (df['result'] != 'draw')

    # Apply the condition and update the 'Win' column
    df.loc[mask_win, 'Win'] = df.loc[mask_win, 'predicted_score_difference'] + 0.02

    # Define the selections for 'DNB'
    selections_DNB = ["DNB or O 2.5 (both untested)", "DNB (untested) or O 2.5", "DNB or O 2.5 (untested)",
                      "DNB or O 2.5", "DNB or O 1.5 (both untested)", "DNB (untested) or O 1.5", 
                      "DNB or O 1.5 (untested)", "DNB or O 1.5", "DNB (untested)", "DNB"]

    # Create a boolean mask for the condition for 'DNB'
    mask_DNB = ((df['selection_match'] == 'no match') & \
                (df['selection'].isin(selections_DNB)) & \
                (df['result_match'] == 'no match') & \
                (df['result'] != 'draw'))

    # Apply the condition and update the 'DNB' column
    df.loc[mask_DNB, 'DNB'] = df.loc[mask_DNB, 'predicted_score_difference'] + 0.02

    # Define the selections for O 1.5'
    selections_O_1_5 = ["W & O 1.5 (both untested)", "Win (untested) & O 1.5", "Win & O 1.5 (untested)",
                        "W & O 1.5", "DNB or O 1.5 (both untested)", "DNB (untested) or O 1.5", 
                        "DNB or O 1.5 (untested)", "DNB or O 1.5", "O 1.5 (untested)", "O 1.5"]

    # Create a boolean mask for the condition for 'O 1.5'
    mask_O_1_5 = ((df['selection_match'] == 'no match') & \
                (df['selection'].isin(selections_O_1_5)) & \
                (df['total_score'] < 2))

    # Apply the condition and update the 'O 1.5' column
    df.loc[mask_O_1_5, 'O_1_5'] = df.loc[mask_O_1_5, 'predicted_total_score'] + 0.02

    # Define the selections for O 2.5'
    selections_O_2_5 = ["W & O 2.5 (both untested)", "Win (untested) & O 2.5", "Win & O 2.5 (untested)", 
                        "W & O 2.5", "DNB or O 2.5 (both untested)", "DNB (untested) or O 2.5",
                        "DNB or O 2.5 (untested)", "DNB or O 2.5", "O 2.5 (untested)", "O 2.5"]

    # Create a boolean mask for the condition for 'O 2.5'
    mask_O_2_5 = ((df['selection_match'] == 'no match') & \
                (df['selection'].isin(selections_O_2_5)) & \
                (df['total_score'] < 3))

    # Apply the condition and update the 'O 2.5' column
    df.loc[mask_O_2_5, 'O_2_5'] = df.loc[mask_O_2_5, 'predicted_total_score'] + 0.02

    # Define the selections for U 4.5'
    selections_U_4_5 = ["W & U 4.5 (both untested)", "Win (untested) & U 4.5", "Win & U 4.5 (untested)",
                        "W & U 4.5", "U 4.5 (untested)", "U 4.5"]

    # Create a boolean mask for the condition for 'O 2.5'
    mask_U_4_5 = ((df['selection_match'] == 'no match') & \
                (df['selection'].isin(selections_U_4_5)) & \
                (df['total_score'] > 4))

    # Apply the condition and update the 'O 2.5' column
    df.loc[mask_U_4_5, 'U_4_5'] = df.loc[mask_U_4_5, 'predicted_total_score'] - 0.02

    return df

第一次运行是成功的:

但是,后面的循环没有任何变化。

虽然我的 dataframe 非常大,但列的更新是部分的。我不太确定为什么会这样。

原始的 dataframe 没有受到影响。

如果我把每个 if-else 拆开会有帮助吗?不过 dataframe 太大了,计算每一行需要 20 分钟。

我通过以下方式应用它:

df = selection_update_weights(df)

第一次运行是成功的:

home_score  away_score  total_score  score_difference  predicted_total_score  predicted_score_difference result predicted_result result_match  Win  DNB  O_1_5  O_2_5  U_4_5                  selection selection_match
3            2           0            2                 2              12.370528                   12.090888   home             home        match  1.1  0.7      2      3      4  W & O 2.5 (both untested)        no match
9            2           0            2                 2              11.439416                   10.291339   home             home        match  1.1  0.7      2      3      4  W & O 2.5 (both untested)        no match
10           2           0            2                 2              11.226599                   10.228954   home             home        match  1.1  0.7      2      3      4  W & O 2.5 (both untested)        no match
11           1           5            6                 4              12.069979                   10.194557   away             home     no match  1.1  0.7      2      3      4  W & O 2.5 (both untested)        no match
20           2           0            2                 2               9.808659                    9.049657   home             home        match  1.1  0.7      2      3      4  W & O 2.5 (both untested)        no match

当我运行这个函数时,它提供了:

home_score  away_score  total_score  score_difference  predicted_total_score  predicted_score_difference result predicted_result result_match       Win  DNB     O_1_5     O_2_5  U_4_5                  selection selection_match
44           3           3            6                 0               8.748172                    8.135116   draw             home     no match  8.155116  0.7  2.000000  3.000000    4.0  W & O 2.5 (both untested)        no match
50           1           0            1                 1               8.605350                    7.932909   home             home        match  1.100000  0.7  8.625350  8.625350    4.0  W & O 1.5 (both untested)        no match
57           1           1            2                 0               7.510030                    7.750101   draw             home     no match  7.770101  0.7  2.000000  7.530030    4.0  W & O 1.5 (both untested)        no match
62           0           1            1                 1               8.895045                    7.710740   away             away        match  1.100000  0.7  8.915045  8.915045    4.0  W & O 1.5 (both untested)        no match
85           1           0            1                 1               8.099853                    7.444815   home             home        match  1.100000  0.7  8.119853  8.119853    4.0  W & O 1.5 (both untested)        no match

但是,后面的循环没有任何变化。

虽然我的 dataframe 非常大,这段代码是权重没有更新的地方。权重是部分更新的。我不太确定为什么。

df.head():
home_score  away_score  total_score  score_difference  predicted_total_score  predicted_score_difference result predicted_result result_match  Win  DNB  O_1_5     O_2_5  U_4_5                  selection selection_match
44           3           3            6                 0               8.748172                    8.135116   draw             home     no match  1.1  0.7    2.0  3.000000    4.0  W & O 2.5 (both untested)        no match
50           1           0            1                 1               8.605350                    7.932909   home             home        match  1.1  0.7    2.0  8.625350    4.0  W & O 1.5 (both untested)        no match
57           1           1            2                 0               7.510030                    7.750101   draw             home     no match  1.1  0.7    2.0  7.530030    4.0  W & O 1.5 (both untested)        no match
62           0           1            1                 1               8.895045                    7.710740   away             away        match  1.1  0.7    2.0  8.915045    4.0  W & O 1.5 (both untested)        no match
85           1           0            1                 1               8.099853                    7.444815   home             home        match  1.1  0.7    2.0  8.119853    4.0  W & O 1.5 (both untested)        no match

所以当我应用它时:

df = selection_update_weights(df)

我理想中应该得到:

home_score  away_score  total_score  score_difference  predicted_total_score  predicted_score_difference result predicted_result result_match       Win  DNB    O_1_5    O_2_5    U_4_5                       selection  selection_match
          3           3            6                 0               8.748172                    8.135116   draw             home     no match  8.155116  0.7       2.0         3      4.0      W & O 2.5 (both untested)        no match
          1           0            1                 1               8.605350                    7.932909   home             home        match  1.100000  0.7  8.625350  8.625350      4.0      W & O 1.5 (both untested)        no match
          1           1            2                 0               7.510030                    7.750101   draw             home     no match  7.770101  0.7       2.0  7.530030      4.0      W & O 1.5 (both untested)        no match
          0           1            1                 1               8.895045                    7.710740   away             away        match  1.100000  0.7  8.915045  8.915045      4.0      W & O 1.5 (both untested)        no match
          1           0            1                 1               8.099853                    7.444815   home             home        match  1.100000  0.7  8.119853  8.119853      4.0      W & O 1.5 (both untested)        no match

然而,这并没有发生,原始的 dataframe 也没有受到影响。

如果我把每个 if-else 拆开会有帮助吗?不过 dataframe 太大了,计算每一行需要 20 分钟。

1 个回答

0

在第二个循环中,只有Win和O_2_5这两列会被更新。Win的更新是根据预测的分数差值来进行的,而O_2_5的更新则是根据预测的总分来进行的。

其实,预测的分数差值和预测的总分在selection_update_weights这个函数里是不会改变的,所以我们可以把它们当作常量来看待。因为你多次调用这个方法,并且是根据这些常量来更新它的值,所以它们永远不会有其他的值。

我不太明白你为什么要多次调用selection_update_weights这个方法,也许你应该根据Win、O_2_5(以及其他列)自身的值来更新它们,或者在selection_update_weights函数里更新预测的分数差值和预测的总分。

撰写回答