如何优化Shapely和Sklearn代码?

2024-05-29 07:48:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在处理一个420万点的数据集,我的代码已经需要一段时间来处理,但是下面的代码需要几个小时来处理(该代码在其他公开问题中提供,基本上它将最近的线字符串带到一个点,从该线字符串找到最近的点并计算距离)

这些代码实际上做得很好,但是对于它的目的来说花费的时间太长了,我怎么能在最短的时间内优化或者做同样的事情呢

import geopandas as gpd
import numpy as np

from shapely.geometry import Point, LineString
from shapely.ops import nearest_points
from sklearn.neighbors import DistanceMetric

EARTH_RADIUS_IN_MILES = 3440.1 #NAUTICAL MILES

panama = gpd.read_file("/Users/Danilo/Documents/Python/panama_coastline/panama_coastline.shp")

for c in range(b):
    #p = Point(-77.65325423107359,9.222038196656131)
    p=Point(data['longitude'][c],data['latitude'][c])

    def closest_line(point, linestrings):
        return np.argmin( [p.distance(linestring) for linestring in  panama.geometry] )
    
    closest_linestring = panama.geometry[ closest_line(p, panama.geometry) ]
    closest_linestring
    closest_point = nearest_points(p, closest_linestring)
           
    dist = DistanceMetric.get_metric('haversine')
    points_as_floats = [ np.array([p.x, p.y]) for p in closest_point ]
        
    haversine_distances = dist.pairwise(np.radians(points_as_floats), np.radians(points_as_floats) )
    haversine_distances *= EARTH_RADIUS_IN_MILES

    dtc1=haversine_distances[0][1]
    dtc.append(dtc1)

Tags: 代码infromimportforasnppoints
1条回答
网友
1楼 · 发布于 2024-05-29 07:48:24

编辑:使用BallTree简化为单个计算

进口

import pandas as pd
import geopandas as gpd
import numpy as np

from shapely.geometry import Point, LineString
from shapely.ops import nearest_points

读巴拿马

panama = gpd.read_file("panama_coastline/panama_coastline.shp")

获取所有点,长,纬度格式:

def get_points_as_numpy(geom):
    work_list = []
    for g in geom:
        work_list.append( np.array(g.coords) )
        
    return np.concatenate(work_list)
        
all_coastline_points = get_points_as_numpy(panama.geometry)

创建Balltree

from sklearn.neighbors import BallTree
import numpy as np

panama_radians =  np.radians(np.flip(all_coastline_points,axis=1))

tree = BallTree(panama_radians, leaf_size=12, metric='haversine')

创建1M随机点:

mean = [8.5,-80]
cov = [[1,0],[0,5]] # diagonal covariance, points lie on x or y-axis


random_gps = np.random.multivariate_normal(mean,cov,(10**6))
random_points = pd.DataFrame( {'lat' : random_gps[:,0], 'long' : random_gps[:,1]})
random_points.head()

计算最近的滑行点(在我的机器上,<;30秒)

distances, index = tree.query( np.radians(random_gps), k=1)

将结果放入数据框中

EARTH_RADIUS_IN_MILES = 3440.1

random_points['distance_to_coast'] = distances * EARTH_RADIUS_IN_MILES
random_points['closest_lat'] = all_coastline_points[index][:,0,1]
random_points['closest_long'] = all_coastline_points[index][:,0,0]

img

相关问题 更多 >

    热门问题