python:向量化定义仅在第一个条件下有效。后续循环不受影响
我有一个向量化的函数:
def selection_update_weights(df):
# Define the selections for 'Win'
selections_win = ["W & O 2.5 (both untested)", "Win (untested) & O 2.5", "Win & O 2.5 (untested)", "W & O 2.5",
"W & O 1.5 (both untested)", "Win (untested) & O 1.5", "Win & O 1.5 (untested)", "W & O 1.5",
"W & U 4.5 (both untested)", "Win (untested) & U 4.5", "Win & U 4.5 (untested)", "W & U 4.5",
"W (untested)", "W"]
# Create a boolean mask for the condition for 'Win'
mask_win = (df['selection_match'] == "no match") & \
(df['selection'].isin(selections_win)) & \
(df['result_match'] == "no match") & \
(df['result'] != 'draw')
# Apply the condition and update the 'Win' column
df.loc[mask_win, 'Win'] = df.loc[mask_win, 'predicted_score_difference'] + 0.02
# Define the selections for 'DNB'
selections_DNB = ["DNB or O 2.5 (both untested)", "DNB (untested) or O 2.5", "DNB or O 2.5 (untested)",
"DNB or O 2.5", "DNB or O 1.5 (both untested)", "DNB (untested) or O 1.5",
"DNB or O 1.5 (untested)", "DNB or O 1.5", "DNB (untested)", "DNB"]
# Create a boolean mask for the condition for 'DNB'
mask_DNB = ((df['selection_match'] == 'no match') & \
(df['selection'].isin(selections_DNB)) & \
(df['result_match'] == 'no match') & \
(df['result'] != 'draw'))
# Apply the condition and update the 'DNB' column
df.loc[mask_DNB, 'DNB'] = df.loc[mask_DNB, 'predicted_score_difference'] + 0.02
# Define the selections for O 1.5'
selections_O_1_5 = ["W & O 1.5 (both untested)", "Win (untested) & O 1.5", "Win & O 1.5 (untested)",
"W & O 1.5", "DNB or O 1.5 (both untested)", "DNB (untested) or O 1.5",
"DNB or O 1.5 (untested)", "DNB or O 1.5", "O 1.5 (untested)", "O 1.5"]
# Create a boolean mask for the condition for 'O 1.5'
mask_O_1_5 = ((df['selection_match'] == 'no match') & \
(df['selection'].isin(selections_O_1_5)) & \
(df['total_score'] < 2))
# Apply the condition and update the 'O 1.5' column
df.loc[mask_O_1_5, 'O_1_5'] = df.loc[mask_O_1_5, 'predicted_total_score'] + 0.02
# Define the selections for O 2.5'
selections_O_2_5 = ["W & O 2.5 (both untested)", "Win (untested) & O 2.5", "Win & O 2.5 (untested)",
"W & O 2.5", "DNB or O 2.5 (both untested)", "DNB (untested) or O 2.5",
"DNB or O 2.5 (untested)", "DNB or O 2.5", "O 2.5 (untested)", "O 2.5"]
# Create a boolean mask for the condition for 'O 2.5'
mask_O_2_5 = ((df['selection_match'] == 'no match') & \
(df['selection'].isin(selections_O_2_5)) & \
(df['total_score'] < 3))
# Apply the condition and update the 'O 2.5' column
df.loc[mask_O_2_5, 'O_2_5'] = df.loc[mask_O_2_5, 'predicted_total_score'] + 0.02
# Define the selections for U 4.5'
selections_U_4_5 = ["W & U 4.5 (both untested)", "Win (untested) & U 4.5", "Win & U 4.5 (untested)",
"W & U 4.5", "U 4.5 (untested)", "U 4.5"]
# Create a boolean mask for the condition for 'O 2.5'
mask_U_4_5 = ((df['selection_match'] == 'no match') & \
(df['selection'].isin(selections_U_4_5)) & \
(df['total_score'] > 4))
# Apply the condition and update the 'O 2.5' column
df.loc[mask_U_4_5, 'U_4_5'] = df.loc[mask_U_4_5, 'predicted_total_score'] - 0.02
return df
第一次运行是成功的:
但是,后面的循环没有任何变化。
虽然我的 dataframe
非常大,但列的更新是部分的。我不太确定为什么会这样。
原始的 dataframe
没有受到影响。
如果我把每个 if-else 拆开会有帮助吗?不过 dataframe
太大了,计算每一行需要 20 分钟。
我通过以下方式应用它:
df = selection_update_weights(df)
第一次运行是成功的:
home_score away_score total_score score_difference predicted_total_score predicted_score_difference result predicted_result result_match Win DNB O_1_5 O_2_5 U_4_5 selection selection_match
3 2 0 2 2 12.370528 12.090888 home home match 1.1 0.7 2 3 4 W & O 2.5 (both untested) no match
9 2 0 2 2 11.439416 10.291339 home home match 1.1 0.7 2 3 4 W & O 2.5 (both untested) no match
10 2 0 2 2 11.226599 10.228954 home home match 1.1 0.7 2 3 4 W & O 2.5 (both untested) no match
11 1 5 6 4 12.069979 10.194557 away home no match 1.1 0.7 2 3 4 W & O 2.5 (both untested) no match
20 2 0 2 2 9.808659 9.049657 home home match 1.1 0.7 2 3 4 W & O 2.5 (both untested) no match
当我运行这个函数时,它提供了:
home_score away_score total_score score_difference predicted_total_score predicted_score_difference result predicted_result result_match Win DNB O_1_5 O_2_5 U_4_5 selection selection_match
44 3 3 6 0 8.748172 8.135116 draw home no match 8.155116 0.7 2.000000 3.000000 4.0 W & O 2.5 (both untested) no match
50 1 0 1 1 8.605350 7.932909 home home match 1.100000 0.7 8.625350 8.625350 4.0 W & O 1.5 (both untested) no match
57 1 1 2 0 7.510030 7.750101 draw home no match 7.770101 0.7 2.000000 7.530030 4.0 W & O 1.5 (both untested) no match
62 0 1 1 1 8.895045 7.710740 away away match 1.100000 0.7 8.915045 8.915045 4.0 W & O 1.5 (both untested) no match
85 1 0 1 1 8.099853 7.444815 home home match 1.100000 0.7 8.119853 8.119853 4.0 W & O 1.5 (both untested) no match
但是,后面的循环没有任何变化。
虽然我的 dataframe
非常大,这段代码是权重没有更新的地方。权重是部分更新的。我不太确定为什么。
df.head():
home_score away_score total_score score_difference predicted_total_score predicted_score_difference result predicted_result result_match Win DNB O_1_5 O_2_5 U_4_5 selection selection_match
44 3 3 6 0 8.748172 8.135116 draw home no match 1.1 0.7 2.0 3.000000 4.0 W & O 2.5 (both untested) no match
50 1 0 1 1 8.605350 7.932909 home home match 1.1 0.7 2.0 8.625350 4.0 W & O 1.5 (both untested) no match
57 1 1 2 0 7.510030 7.750101 draw home no match 1.1 0.7 2.0 7.530030 4.0 W & O 1.5 (both untested) no match
62 0 1 1 1 8.895045 7.710740 away away match 1.1 0.7 2.0 8.915045 4.0 W & O 1.5 (both untested) no match
85 1 0 1 1 8.099853 7.444815 home home match 1.1 0.7 2.0 8.119853 4.0 W & O 1.5 (both untested) no match
所以当我应用它时:
df = selection_update_weights(df)
我理想中应该得到:
home_score away_score total_score score_difference predicted_total_score predicted_score_difference result predicted_result result_match Win DNB O_1_5 O_2_5 U_4_5 selection selection_match
3 3 6 0 8.748172 8.135116 draw home no match 8.155116 0.7 2.0 3 4.0 W & O 2.5 (both untested) no match
1 0 1 1 8.605350 7.932909 home home match 1.100000 0.7 8.625350 8.625350 4.0 W & O 1.5 (both untested) no match
1 1 2 0 7.510030 7.750101 draw home no match 7.770101 0.7 2.0 7.530030 4.0 W & O 1.5 (both untested) no match
0 1 1 1 8.895045 7.710740 away away match 1.100000 0.7 8.915045 8.915045 4.0 W & O 1.5 (both untested) no match
1 0 1 1 8.099853 7.444815 home home match 1.100000 0.7 8.119853 8.119853 4.0 W & O 1.5 (both untested) no match
然而,这并没有发生,原始的 dataframe
也没有受到影响。
如果我把每个 if-else
拆开会有帮助吗?不过 dataframe
太大了,计算每一行需要 20 分钟。
1 个回答
0
在第二个循环中,只有Win和O_2_5这两列会被更新。Win的更新是根据预测的分数差值来进行的,而O_2_5的更新则是根据预测的总分来进行的。
其实,预测的分数差值和预测的总分在selection_update_weights这个函数里是不会改变的,所以我们可以把它们当作常量来看待。因为你多次调用这个方法,并且是根据这些常量来更新它的值,所以它们永远不会有其他的值。
我不太明白你为什么要多次调用selection_update_weights这个方法,也许你应该根据Win、O_2_5(以及其他列)自身的值来更新它们,或者在selection_update_weights函数里更新预测的分数差值和预测的总分。