python：向量化定义仅在第一个条件下有效。后续循环不受影响

Question

我有一个向量化的函数：

def selection_update_weights(df):
    # Define the selections for 'Win'
    selections_win = ["W & O 2.5 (both untested)", "Win (untested) & O 2.5", "Win & O 2.5 (untested)", "W & O 2.5", 
                      "W & O 1.5 (both untested)", "Win (untested) & O 1.5", "Win & O 1.5 (untested)", "W & O 1.5", 
                      "W & U 4.5 (both untested)", "Win (untested) & U 4.5", "Win & U 4.5 (untested)", "W & U 4.5", 
                      "W (untested)", "W"]

    # Create a boolean mask for the condition for 'Win'
    mask_win = (df['selection_match'] == "no match") & \
               (df['selection'].isin(selections_win)) & \
               (df['result_match'] == "no match") & \
               (df['result'] != 'draw')

    # Apply the condition and update the 'Win' column
    df.loc[mask_win, 'Win'] = df.loc[mask_win, 'predicted_score_difference'] + 0.02

    # Define the selections for 'DNB'
    selections_DNB = ["DNB or O 2.5 (both untested)", "DNB (untested) or O 2.5", "DNB or O 2.5 (untested)",
                      "DNB or O 2.5", "DNB or O 1.5 (both untested)", "DNB (untested) or O 1.5", 
                      "DNB or O 1.5 (untested)", "DNB or O 1.5", "DNB (untested)", "DNB"]

    # Create a boolean mask for the condition for 'DNB'
    mask_DNB = ((df['selection_match'] == 'no match') & \
                (df['selection'].isin(selections_DNB)) & \
                (df['result_match'] == 'no match') & \
                (df['result'] != 'draw'))

    # Apply the condition and update the 'DNB' column
    df.loc[mask_DNB, 'DNB'] = df.loc[mask_DNB, 'predicted_score_difference'] + 0.02

    # Define the selections for O 1.5'
    selections_O_1_5 = ["W & O 1.5 (both untested)", "Win (untested) & O 1.5", "Win & O 1.5 (untested)",
                        "W & O 1.5", "DNB or O 1.5 (both untested)", "DNB (untested) or O 1.5", 
                        "DNB or O 1.5 (untested)", "DNB or O 1.5", "O 1.5 (untested)", "O 1.5"]

    # Create a boolean mask for the condition for 'O 1.5'
    mask_O_1_5 = ((df['selection_match'] == 'no match') & \
                (df['selection'].isin(selections_O_1_5)) & \
                (df['total_score'] < 2))

    # Apply the condition and update the 'O 1.5' column
    df.loc[mask_O_1_5, 'O_1_5'] = df.loc[mask_O_1_5, 'predicted_total_score'] + 0.02

    # Define the selections for O 2.5'
    selections_O_2_5 = ["W & O 2.5 (both untested)", "Win (untested) & O 2.5", "Win & O 2.5 (untested)", 
                        "W & O 2.5", "DNB or O 2.5 (both untested)", "DNB (untested) or O 2.5",
                        "DNB or O 2.5 (untested)", "DNB or O 2.5", "O 2.5 (untested)", "O 2.5"]

    # Create a boolean mask for the condition for 'O 2.5'
    mask_O_2_5 = ((df['selection_match'] == 'no match') & \
                (df['selection'].isin(selections_O_2_5)) & \
                (df['total_score'] < 3))

    # Apply the condition and update the 'O 2.5' column
    df.loc[mask_O_2_5, 'O_2_5'] = df.loc[mask_O_2_5, 'predicted_total_score'] + 0.02

    # Define the selections for U 4.5'
    selections_U_4_5 = ["W & U 4.5 (both untested)", "Win (untested) & U 4.5", "Win & U 4.5 (untested)",
                        "W & U 4.5", "U 4.5 (untested)", "U 4.5"]

    # Create a boolean mask for the condition for 'O 2.5'
    mask_U_4_5 = ((df['selection_match'] == 'no match') & \
                (df['selection'].isin(selections_U_4_5)) & \
                (df['total_score'] > 4))

    # Apply the condition and update the 'O 2.5' column
    df.loc[mask_U_4_5, 'U_4_5'] = df.loc[mask_U_4_5, 'predicted_total_score'] - 0.02

    return df

第一次运行是成功的：

但是，后面的循环没有任何变化。

虽然我的 dataframe 非常大，但列的更新是部分的。我不太确定为什么会这样。

原始的 dataframe 没有受到影响。

如果我把每个 if-else 拆开会有帮助吗？不过 dataframe 太大了，计算每一行需要 20 分钟。

我通过以下方式应用它：

df = selection_update_weights(df)

第一次运行是成功的：

home_score  away_score  total_score  score_difference  predicted_total_score  predicted_score_difference result predicted_result result_match  Win  DNB  O_1_5  O_2_5  U_4_5                  selection selection_match
3            2           0            2                 2              12.370528                   12.090888   home             home        match  1.1  0.7      2      3      4  W & O 2.5 (both untested)        no match
9            2           0            2                 2              11.439416                   10.291339   home             home        match  1.1  0.7      2      3      4  W & O 2.5 (both untested)        no match
10           2           0            2                 2              11.226599                   10.228954   home             home        match  1.1  0.7      2      3      4  W & O 2.5 (both untested)        no match
11           1           5            6                 4              12.069979                   10.194557   away             home     no match  1.1  0.7      2      3      4  W & O 2.5 (both untested)        no match
20           2           0            2                 2               9.808659                    9.049657   home             home        match  1.1  0.7      2      3      4  W & O 2.5 (both untested)        no match

当我运行这个函数时，它提供了：

home_score  away_score  total_score  score_difference  predicted_total_score  predicted_score_difference result predicted_result result_match       Win  DNB     O_1_5     O_2_5  U_4_5                  selection selection_match
44           3           3            6                 0               8.748172                    8.135116   draw             home     no match  8.155116  0.7  2.000000  3.000000    4.0  W & O 2.5 (both untested)        no match
50           1           0            1                 1               8.605350                    7.932909   home             home        match  1.100000  0.7  8.625350  8.625350    4.0  W & O 1.5 (both untested)        no match
57           1           1            2                 0               7.510030                    7.750101   draw             home     no match  7.770101  0.7  2.000000  7.530030    4.0  W & O 1.5 (both untested)        no match
62           0           1            1                 1               8.895045                    7.710740   away             away        match  1.100000  0.7  8.915045  8.915045    4.0  W & O 1.5 (both untested)        no match
85           1           0            1                 1               8.099853                    7.444815   home             home        match  1.100000  0.7  8.119853  8.119853    4.0  W & O 1.5 (both untested)        no match

但是，后面的循环没有任何变化。

虽然我的 dataframe 非常大，这段代码是权重没有更新的地方。权重是部分更新的。我不太确定为什么。

df.head():
home_score  away_score  total_score  score_difference  predicted_total_score  predicted_score_difference result predicted_result result_match  Win  DNB  O_1_5     O_2_5  U_4_5                  selection selection_match
44           3           3            6                 0               8.748172                    8.135116   draw             home     no match  1.1  0.7    2.0  3.000000    4.0  W & O 2.5 (both untested)        no match
50           1           0            1                 1               8.605350                    7.932909   home             home        match  1.1  0.7    2.0  8.625350    4.0  W & O 1.5 (both untested)        no match
57           1           1            2                 0               7.510030                    7.750101   draw             home     no match  1.1  0.7    2.0  7.530030    4.0  W & O 1.5 (both untested)        no match
62           0           1            1                 1               8.895045                    7.710740   away             away        match  1.1  0.7    2.0  8.915045    4.0  W & O 1.5 (both untested)        no match
85           1           0            1                 1               8.099853                    7.444815   home             home        match  1.1  0.7    2.0  8.119853    4.0  W & O 1.5 (both untested)        no match

所以当我应用它时：

df = selection_update_weights(df)

我理想中应该得到：

home_score  away_score  total_score  score_difference  predicted_total_score  predicted_score_difference result predicted_result result_match       Win  DNB    O_1_5    O_2_5    U_4_5                       selection  selection_match
          3           3            6                 0               8.748172                    8.135116   draw             home     no match  8.155116  0.7       2.0         3      4.0      W & O 2.5 (both untested)        no match
          1           0            1                 1               8.605350                    7.932909   home             home        match  1.100000  0.7  8.625350  8.625350      4.0      W & O 1.5 (both untested)        no match
          1           1            2                 0               7.510030                    7.750101   draw             home     no match  7.770101  0.7       2.0  7.530030      4.0      W & O 1.5 (both untested)        no match
          0           1            1                 1               8.895045                    7.710740   away             away        match  1.100000  0.7  8.915045  8.915045      4.0      W & O 1.5 (both untested)        no match
          1           0            1                 1               8.099853                    7.444815   home             home        match  1.100000  0.7  8.119853  8.119853      4.0      W & O 1.5 (both untested)        no match

然而，这并没有发生，原始的 dataframe 也没有受到影响。

如果我把每个 if-else 拆开会有帮助吗？不过 dataframe 太大了，计算每一行需要 20 分钟。

数据处理条件语句计算效率循环函数优化权重更新向量化部分更新

python：向量化定义仅在第一个条件下有效。后续循环不受影响

1 个回答

撰写回答