我正在处理一个420万点的数据集,我的代码已经需要一段时间来处理,但是下面的代码需要几个小时来处理(该代码在其他公开问题中提供,基本上它将最近的线字符串带到一个点,从该线字符串找到最近的点并计算距离)
这些代码实际上做得很好,但是对于它的目的来说花费的时间太长了,我怎么能在最短的时间内优化或者做同样的事情呢
import geopandas as gpd
import numpy as np
from shapely.geometry import Point, LineString
from shapely.ops import nearest_points
from sklearn.neighbors import DistanceMetric
EARTH_RADIUS_IN_MILES = 3440.1 #NAUTICAL MILES
panama = gpd.read_file("/Users/Danilo/Documents/Python/panama_coastline/panama_coastline.shp")
for c in range(b):
#p = Point(-77.65325423107359,9.222038196656131)
p=Point(data['longitude'][c],data['latitude'][c])
def closest_line(point, linestrings):
return np.argmin( [p.distance(linestring) for linestring in panama.geometry] )
closest_linestring = panama.geometry[ closest_line(p, panama.geometry) ]
closest_linestring
closest_point = nearest_points(p, closest_linestring)
dist = DistanceMetric.get_metric('haversine')
points_as_floats = [ np.array([p.x, p.y]) for p in closest_point ]
haversine_distances = dist.pairwise(np.radians(points_as_floats), np.radians(points_as_floats) )
haversine_distances *= EARTH_RADIUS_IN_MILES
dtc1=haversine_distances[0][1]
dtc.append(dtc1)
编辑:使用BallTree简化为单个计算
进口
读巴拿马
获取所有点,长,纬度格式:
创建Balltree
创建1M随机点:
计算最近的滑行点(在我的机器上,<;30秒)
将结果放入数据框中
相关问题 更多 >
编程相关推荐