假设我有一个很大的数据帧,包含如下列:
Date | Person 1 | Person 2 | Value 1 | Value 2
+----------------------------------------------+
假设数据帧是从最旧到最新排序的。你知道吗
现在,我想迭代这个数据帧。你知道吗
对于每一行,我首先看人1。呼叫此手机号码\u 1\u id
对于person1,我想取最近的前一行get Value 1,并执行复杂的计算。
当前获取最新值1(v1)的方法是:
value1s = df.loc[(df.ID1 == Person_1_id) & (df.Date < date)]
v1 = value1s.iloc[-1]
据我所知,loc将循环并获得满足条件的所有先前值。你知道吗
简单地向上循环数据帧,并选取满足条件的第一行不是更快吗?你知道吗
如果是这样的话,如何在数据帧上向后迭代?你知道吗
编辑:示例:
我的初始表:
DATE Person 1 Person 2 value 1 value 2
13/08/2019 71 19 1000 1000
16/08/2019 19 68 1000 1000
19/08/2019 30 98 1000 1000
22/08/2019 42 32 1000 1000
25/08/2019 19 78 1000 1000
算法:
执行以下计算:
flag = 0
if person in 'Person 1' then flag = 1
new_value = most_recent_prev_row['value 1'] + flag * 0.5 * (most_recent_prev_row['value 2']
current_row['Value 1'] = new_value
例如,更新上表中人员19的第二行:
DATE Person 1 Person 2 value 1 value 2
13/08/2019 19 71 1000 1000
16/08/2019 19 68 1000+0.5*1000=1500 1000
如果第一行是:
DATE Person 1 Person 2 value 1 value 2
13/08/2019 71 19 1000 1000
16/08/2019 19 68 1000-0.5*1000=1500 1000
最后,我的计算代码如下。它是一行一行地应用的,速度非常慢:
# helper function to calculate new value
def calculate(value1, value2, flag):
new_value = value1 + flag * 0.5 * value2
# function to update value
def updateValue(playerId, date):
# default value if player has no wins or losses
score = 1000
# get win and losses for the player. Players in 'Person 1' won, players in 'Person 2' lost.
wins = df.loc[(df['Person 1'] == playerId) & (df.DATE < date)]
losses = df.loc[(df['Person 2'] == playerId) & (df.DATE < date)]
# player only has wins
if not wins.empty and losses.empty:
result_row = wins.iloc[-1]
score = calculate(result_row.value1, result_row.value2, 1)
# player only has losses
if wins.empty and not losses.empty:
result_row = losses.iloc[-1]
score = calculate(result_row.value1, result_row.value2, 0)
# player has wins and losses
if not wins.empty and not losses.empty:
p1_win_row = wins.iloc[-1]
p1_lost_row = losses.iloc[-1]
result_row = pd.DataFrame()
if p1_win_row.DATE < p1_lost_row.DATE:
result_row = losses.iloc[-1]
score = calculate(result_row.value1, result_row.value2, 0)
else:
result_row = wins.iloc[-1]
score = calculate(result_row.value1, result_row.value2, 1)
return score
目前没有回答
相关问题 更多 >
编程相关推荐