python多记录时空查询

加速功能

def work(df): output = [] #loop through data index's for i in range(0, len(df)): l = [] #first we will filter out the data by date to have a smaller list to compute distances for #create a mask to query all dates between range for date i date_mask = (df['date'] >= df['date'].iloc[i]-before) & (df['date'] <= df['date'].iloc[i]+after) #create a mask to query all users who are not user i (themselves) user_mask = df['user']!=df['user'].iloc[i] #apply masks dists_to_check = df[date_mask & user_mask] #for point i, create coordinate to calculate distances from a = np.array((df['long'].iloc[i], df['lat'].iloc[i])) #create array of distances to check on the masked data b = np.array((dists_to_check['long'].values, dists_to_check['lat'].values)) #for j in the date queried data for j in range(1, len(dists_to_check)): #compute the ueclidean distance between point a and each point of b (the date masked data) x = np.linalg.norm(a-np.array((b[0][j], b[1][j]))) #if the distance is within our range of interest append the index to a list if x <=100: l.append(j) else: pass try: #use the list of desired index's 'l' to query a final subset of the data data = dists_to_check.iloc[l] #summarize the column of interest then append to output list output.append(data['status'].sum()) except IndexError, e: output.append(0) #print "There were no data to add" return pd.DataFrame(output)

1条回答

网友

1楼 · 发布于 2024-04-25 14:16:11

这至少在一定程度上解决了我的问题。由于循环可以独立地操作数据的不同部分，所以并行化在这里是有意义的。在

使用Ipython

from IPython.parallel import Client
cli = Client()
cli.ids

cli = Client()
dview=cli[:]

with dview.sync_imports():
    import numpy as np
    import os
    from datetime import timedelta
    import pandas as pd

#We also need to add the time deltas and output list into the function as 
#local variables as well as add the Ipython.parallel decorator

@dview.parallel(block=True)
def work(df):
    before = timedelta(hours = 8)
    after = timedelta(minutes = 1)
    output = []

最终时间1:17:54.910206，约为原始时间的1/4

我仍然非常感兴趣，任何人都可以在函数体中提出一些小的速度改进建议。在

下面是一些可复制的代码供你们测试：

进口

创建数据

函数生成测试数据

加速功能

运行代码并计时

使用Ipython

最终时间1:17:54.910206，约为原始时间的1/4

相关问题更多 >

编程相关推荐

热门问题

热门文章