Python位置,显示距离最近的其他位置的距离

2024-06-16 09:29:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我是数据框中的一个位置,位于lat lon列名下面。我想在一个单独的数据帧中显示它离最近火车站的纬度有多远

例如,我有一个纬度(37.814563 144.970267),我有一个其他地理空间点的列表,如下所示。我想找到最近的点,然后找到这些点之间的距离,作为郊区数据帧中的一个额外列

这是train数据集的示例

<bound method NDFrame.to_clipboard of   STOP_ID                                          STOP_NAME   LATITUDE  \
0   19970             Royal Park Railway Station (Parkville) -37.781193   
1   19971  Flemington Bridge Railway Station (North Melbo... -37.788140   
2   19972         Macaulay Railway Station (North Melbourne) -37.794267   
3   19973   North Melbourne Railway Station (West Melbourne) -37.807419   
4   19974        Clifton Hill Railway Station (Clifton Hill) -37.788657   

    LONGITUDE TICKETZONE                                          ROUTEUSSP  \
0  144.952301          1                                            Upfield   
1  144.939323          1                                            Upfield   
2  144.936166          1                                            Upfield   
3  144.942570          1  Flemington,Sunbury,Upfield,Werribee,Williamsto...   
4  144.995417          1                                 Mernda,Hurstbridge   

                      geometry  
0  POINT (144.95230 -37.78119)  
1  POINT (144.93932 -37.78814)  
2  POINT (144.93617 -37.79427)  
3  POINT (144.94257 -37.80742)  
4  POINT (144.99542 -37.78866)  >

这是郊区的一个例子

<bound method NDFrame.to_clipboard of       postcode              suburb state        lat         lon
4901      3000           MELBOURNE   VIC -37.814563  144.970267
4902      3002      EAST MELBOURNE   VIC -37.816640  144.987811
4903      3003      WEST MELBOURNE   VIC -37.806255  144.941123
4904      3005  WORLD TRADE CENTRE   VIC -37.822262  144.954856
4905      3006           SOUTHBANK   VIC -37.823258  144.965926>

我想在郊区列表的新栏中显示的是从拉特隆到壁橱火车站的距离

使用解决方案会得到奇怪的输出,怀疑它是否正确?

显示了两种解决方案

from sklearn.neighbors import NearestNeighbors
from haversine import haversine

NN = NearestNeighbors(n_neighbors=1, metric='haversine')
NN.fit(trains_shape[['LATITUDE', 'LONGITUDE']])

indices = NN.kneighbors(df_complete[['lat', 'lon']])[1]
indices = [index[0] for index in indices]
distances = NN.kneighbors(df_complete[['lat', 'lon']])[0]
df_complete['closest_station'] = trains_shape.iloc[indices]['STOP_NAME'].reset_index(drop=True)
df_complete['closest_station_distances'] = distances
print(df_complete)

这里的输出

<bound method NDFrame.to_clipboard of    postcode        suburb state        lat         lon  Venues Cluster  \
1      3040    aberfeldie   VIC -37.756690  144.896259             4.0   
2      3042  airport west   VIC -37.711698  144.887037             1.0   
4      3206   albert park   VIC -37.840705  144.955710             0.0   
5      3020        albion   VIC -37.775954  144.819395             2.0   
6      3078    alphington   VIC -37.780767  145.031160             4.0   

                     #1                    #2             #3  \
1                  Café     Electronics Store  Grocery Store   
2  Fast Food Restaurant                  Café    Supermarket   
4                  Café                   Pub    Coffee Shop   
5                  Café  Fast Food Restaurant  Grocery Store   
6                  Café                  Park            Bar   

                      #4  ...                             #6  \
1            Coffee Shop  ...                         Bakery   
2          Grocery Store  ...             Italian Restaurant   
4         Breakfast Spot  ...                   Burger Joint   
5  Vietnamese Restaurant  ...                            Pub   
6            Pizza Place  ...  Vegetarian / Vegan Restaurant   

                      #7                   #8                         #9  \
1          Shopping Mall  Japanese Restaurant          Indian Restaurant   
2  Portuguese Restaurant    Electronics Store  Middle Eastern Restaurant   
4                    Bar               Bakery                  Gastropub   
5     Chinese Restaurant                  Gym                     Bakery   
6     Italian Restaurant            Gastropub                     Bakery   

                 #10 Ancestry Cluster  ClosestStopId  \
1   Greek Restaurant              8.0          20037   
2  Convenience Store              5.0          20032   
4              Beach              6.0          22180   
5  Convenience Store              5.0          20004   
6        Coffee Shop              5.0          19931   

                                   ClosestStopName  \
1              Essendon Railway Station (Essendon)   
2                Glenroy Railway Station (Glenroy)   
4  Southern Cross Railway Station (Melbourne City)   
5          Albion Railway Station (Sunshine North)   
6          Alphington Railway Station (Alphington)   

                                   closest_station closest_station_distances  
1                Glenroy Railway Station (Glenroy)                  0.019918  
2  Southern Cross Railway Station (Melbourne City)                  0.031020  
4          Alphington Railway Station (Alphington)                  0.023165  
5                  Altona Railway Station (Altona)                  0.005559  
6                Newport Railway Station (Newport)                  0.002375  

第二个功能

def ClosestStop(r):
    # Cartesin Distance: square root of (x2-x2)^2 + (y2-y1)^2
    distances = ((r['lat']-StationDf['LATITUDE'])**2 + (r['lon']-StationDf['LONGITUDE'])**2)**0.5
    
    # Stop with minimum Distance from the Suburb
    closestStationId = distances[distances == distances.min()].index.to_list()[0]
    return StationDf.loc[closestStationId, ['STOP_ID', 'STOP_NAME']]

df_complete[['ClosestStopId', 'ClosestStopName']] = df_complete.apply(ClosestStop, axis=1)

奇怪的是,这给出了不同的答案,并导致我认为这段代码存在问题。KM似乎也错了

完全不确定如何解决这个问题-希望在这里得到一些指导,谢谢


Tags: 数据storedfrestaurantpointcompletestoplon
3条回答

可以将sklearn.neighbors.NearestNeighbors与哈弗斯线距离一起使用

import pandas as pd
dfstat = pd.DataFrame({'STOP_ID': ['19970', '19971', '19972', '19973', '19974'],
                       'STOP_NAME': ['Royal Park Railway Station (Parkville)',  'Flemington Bridge Railway Station (North Melbo...',  'Macaulay Railway Station (North Melbourne)',  'North Melbourne Railway Station (West Melbourne)',  'Clifton Hill Railway Station (Clifton Hill)'],
                       'LATITUDE': ['-37.781193', '-37.788140',  '-37.794267',  '-37.807419',  '-37.788657'],
                       'LONGITUDE': ['144.952301', '144.939323', '144.936166',  '144.942570',  '144.995417'],
                       'TICKETZONE': ['1', '1', '1', '1', '1'], 
                       'ROUTEUSSP': ['Upfield',  'Upfield',  'Upfield',  'Flemington,Sunbury,Upfield,Werribee,Williamsto...',  'Mernda,Hurstbridge'],
                       'geometry': ['POINT (144.95230 -37.78119)',  'POINT (144.93932 -37.78814)',  'POINT (144.93617 -37.79427)',  'POINT (144.94257 -37.80742)',  'POINT (144.99542 -37.78866)']})
dfsub = pd.DataFrame({'id': ['4901', '4902', '4903', '4904', '4905'],
                      'postcode': ['3000', '3002', '3003', '3005', '3006'],
                      'suburb': ['MELBOURNE',  'EAST MELBOURNE',  'WEST MELBOURNE',  'WORLD TRADE CENTRE',  'SOUTHBANK'],
                      'state': ['VIC', 'VIC', 'VIC', 'VIC', 'VIC'],
                      'lat': ['-37.814563', '-37.816640', '-37.806255', '-37.822262', '-37.823258'],
                      'lon': ['144.970267', '144.987811', '144.941123', '144.954856', '144.965926']})

让我们首先查找数据帧中距离某个随机点最近的点,例如-37.814563, 144.970267

NN = NearestNeighbors(n_neighbors=1, metric='haversine')
NN.fit(dfstat[['LATITUDE', 'LONGITUDE']])
NN.kneighbors([[-37.814563, 144.970267]])

输出是(array([[2.55952637]]), array([[3]])),数据帧中最近点的距离和索引。sklearn中的哈弗线距离为radius。如果您想计算单位为km,可以使用haversine

from haversine import haversine
NN = NearestNeighbors(n_neighbors=1, metric=haversine)
NN.fit(dfstat[['LATITUDE', 'LONGITUDE']])
NN.kneighbors([[-37.814563, 144.970267]])

输出(array([[2.55952637]]), array([[3]]))的距离以km为单位

现在,您可以应用于数据帧中的所有点,并使用索引获取最近的桩号

indices = NN.kneighbors(dfsub[['lat', 'lon']])[1]
indices = [index[0] for index in indices]
distances = NN.kneighbors(dfsub[['lat', 'lon']])[0]
dfsub['closest_station'] = dfstat.iloc[indices]['STOP_NAME'].reset_index(drop=True)
dfsub['closest_station_distances'] = distances
print(dfsub)
id  postcode    suburb  state   lat lon closest_station closest_station_distances
0   4901    3000    MELBOURNE   VIC -37.814563  144.970267  North Melbourne Railway Station (West Melbourne)    2.559526
1   4902    3002    EAST MELBOURNE  VIC -37.816640  144.987811  Clifton Hill Railway Station (Clifton Hill) 3.182521
2   4903    3003    WEST MELBOURNE  VIC -37.806255  144.941123  North Melbourne Railway Station (West Melbourne)    0.181419
3   4904    3005    WORLD TRADE CENTRE  VIC -37.822262  144.954856  North Melbourne Railway Station (West Melbourne)    1.972010
4   4905    3006    SOUTHBANK   VIC -37.823258  144.965926  North Melbourne Railway Station (West Melbourne)    2.703926

试试这个

import pandas as pd
def ClosestStop(r):
    # Cartesin Distance: square root of (x2-x2)^2 + (y2-y1)^2
    distances = ((r['lat']-StationDf['LATITUDE'])**2 + (r['lon']-StationDf['LONGITUDE'])**2)**0.5
    
    # Stop with minimum Distance from the Suburb
    closestStationId = distances[distances == distances.min()].index.to_list()[0]
    return StationDf.loc[closestStationId, ['STOP_ID', 'STOP_NAME']]

StationDf = pd.read_excel("StationData.xlsx")
SuburbDf = pd.read_excel("SuburbData.xlsx")

SuburbDf[['ClosestStopId', 'ClosestStopName']] = SuburbDf.apply(ClosestStop, axis=1)
print(SuburbDf)

几个关键概念

  1. 在两个数据帧之间进行笛卡尔乘积以获得所有组合(在两个数据帧之间以相同的值进行连接是实现此foo=1的方法)
  2. 一旦两组数据都在一起,就让两组lat/lon计算距离)geopy已用于此目的
  3. 清理列,使用sort_values()查找最小距离
  4. 最后,使用groupby()agg()获得最短距离的第一个

有两个数据帧可供使用

  1. dfdist包含所有组合和距离
  2. dfnearest其中包含结果
dfstat = pd.DataFrame({'STOP_ID': ['19970', '19971', '19972', '19973', '19974'],
 'STOP_NAME': ['Royal Park Railway Station (Parkville)',
  'Flemington Bridge Railway Station (North Melbo...',
  'Macaulay Railway Station (North Melbourne)',
  'North Melbourne Railway Station (West Melbourne)',
  'Clifton Hill Railway Station (Clifton Hill)'],
 'LATITUDE': ['-37.781193',
  '-37.788140',
  '-37.794267',
  '-37.807419',
  '-37.788657'],
 'LONGITUDE': ['144.952301',
  '144.939323',
  '144.936166',
  '144.942570',
  '144.995417'],
 'TICKETZONE': ['1', '1', '1', '1', '1'],
 'ROUTEUSSP': ['Upfield',
  'Upfield',
  'Upfield',
  'Flemington,Sunbury,Upfield,Werribee,Williamsto...',
  'Mernda,Hurstbridge'],
 'geometry': ['POINT (144.95230 -37.78119)',
  'POINT (144.93932 -37.78814)',
  'POINT (144.93617 -37.79427)',
  'POINT (144.94257 -37.80742)',
  'POINT (144.99542 -37.78866)']})
dfsub = pd.DataFrame({'id': ['4901', '4902', '4903', '4904', '4905'],
 'postcode': ['3000', '3002', '3003', '3005', '3006'],
 'suburb': ['MELBOURNE',
  'EAST MELBOURNE',
  'WEST MELBOURNE',
  'WORLD TRADE CENTRE',
  'SOUTHBANK'],
 'state': ['VIC', 'VIC', 'VIC', 'VIC', 'VIC'],
 'lat': ['-37.814563', '-37.816640', '-37.806255', '-37.822262', '-37.823258'],
 'lon': ['144.970267', '144.987811', '144.941123', '144.954856', '144.965926']})

import geopy.distance
# cartesian product so we get all combinations
dfdist = (dfsub.assign(foo=1).merge(dfstat.assign(foo=1), on="foo")
    # calc distance in km between each suburb and each train station
     .assign(km=lambda dfa: dfa.apply(lambda r: 
                                      geopy.distance.geodesic(
                                          (r["LATITUDE"],r["LONGITUDE"]), 
                                          (r["lat"],r["lon"])).km, axis=1))
    # reduce number of columns to make it more digestable
     .loc[:,["postcode","suburb","STOP_ID","STOP_NAME","km"]]
    # sort so shortest distance station from a suburb is first
     .sort_values(["postcode","suburb","km"])
    # good practice
     .reset_index(drop=True)
)
# finally pick out stations nearest to suburb
# this can easily be joined back to source data frames as postcode and STOP_ID have been maintained
dfnearest = dfdist.groupby(["postcode","suburb"])\
    .agg({"STOP_ID":"first","STOP_NAME":"first","km":"first"}).reset_index()

print(dfnearest.to_string(index=False))
dfnearest

输出

postcode              suburb STOP_ID                                         STOP_NAME        km
    3000           MELBOURNE   19973  North Melbourne Railway Station (West Melbourne)  2.564586
    3002      EAST MELBOURNE   19974       Clifton Hill Railway Station (Clifton Hill)  3.177320
    3003      WEST MELBOURNE   19973  North Melbourne Railway Station (West Melbourne)  0.181463
    3005  WORLD TRADE CENTRE   19973  North Melbourne Railway Station (West Melbourne)  1.970909
    3006           SOUTHBANK   19973  North Melbourne Railway Station (West Melbourne)  2.705553

减少测试组合大小的方法

# pick nearer places,  based on lon/lat then all combinations
dfdist = (dfsub.assign(foo=1, latr=dfsub["lat"].round(1), lonr=dfsub["lon"].round(1))
          .merge(dfstat.assign(foo=1, latr=dfstat["LATITUDE"].round(1), lonr=dfstat["LONGITUDE"].round(1)), 
                 on=["foo","latr","lonr"])
    # calc distance in km between each suburb and each train station
     .assign(km=lambda dfa: dfa.apply(lambda r: 
                                      geopy.distance.geodesic(
                                          (r["LATITUDE"],r["LONGITUDE"]), 
                                          (r["lat"],r["lon"])).km, axis=1))
    # reduce number of columns to make it more digestable
     .loc[:,["postcode","suburb","STOP_ID","STOP_NAME","km"]]
    # sort so shortest distance station from a suburb is first
     .sort_values(["postcode","suburb","km"])
    # good practice
     .reset_index(drop=True)
)

相关问题 更多 >