通过数据帧中的唯一id获取第一行和最后一行值

df = pd.DataFrame({ 'id': [1,1,1,2,2], 'lat': [41.144540, 41.144540, 41.163172, 41.163233, 41.163198], 'lon': [-8.562926, -8.562926, -8.583821, -8.583838, -8.583886 ] }) df id lat lon 0 1 41.144540 -8.562926 1 1 41.144540 -8.562926 2 1 41.163172 -8.583821 3 2 41.163233 -8.583838 4 2 41.163198 -8.583886

id lat lon length 0 1 41.144540 -8.562926 1217881.5582 1 1 41.144540 -8.562926 1217881.5582 2 1 41.163172 -8.583821 1217881.5582 3 2 41.163233 -8.583838 5.5979928 4 2 41.163198 -8.583886 5.5979928

3条回答

网友

1楼 · 编辑于 2024-05-23 21:57:36

您可以使用.apply(...)

def get_length(group):
  
  coords = group[['lat', 'lon']].values
  p1, p2 = coords[0], coords[-1]
    
  length = vincenty(p1, p2).m

  return length

grouped = df.groupby(by=['id'])
length = grouped.apply(get_length).rename('length')

df.merge(length, on=['id'])

    id     lat         lon       length
0   1   41.144540   -8.562926   2712.533677
1   1   41.144540   -8.562926   2712.533677
2   1   41.163172   -8.583821   2712.533677
3   2   41.163233   -8.583838   5.597993
4   2   41.163198   -8.583886   5.597993

网友

2楼 · 编辑于 2024-05-23 21:57:36

我无法让vincenty工作，显然它已被geodesic取代。但这应该是可行的：

from geopy.distance import geodesic

df = pd.DataFrame({
    'id': [1,1,1,2,2],
    'lat': [41.144540, 41.144540, 41.163172, 41.163233, 41.163198],
    'lon': [-8.562926, -8.562926, -8.583821, -8.583838, -8.583886   ] 
})


res = (df.groupby(by='id').agg(start_lat=pd.NamedAgg(column='lat', aggfunc='first'), 
                              start_long=pd.NamedAgg(column='lon', aggfunc='first'),
                              end_lat = pd.NamedAgg(column='lat', aggfunc='last'), 
                              end_long=pd.NamedAgg(column='lon', aggfunc='last'))
        .apply(lambda f: geodesic((f['start_lat'], f['start_long']), (f['end_lat'], f['end_long'])), axis=1)
        .reset_index()
        )

df = df.merge(res, on='id').rename(columns={0: 'dist'})

print(df)

网友

3楼 · 编辑于 2024-05-23 21:57:36

您可以使用groupby()和^{}（也称aggregate()）在单个命令中获取第一个和最后一个值：

df.groupby('id').agg({'lat': ['first', 'last'], 'lon': ['first', 'last']})

这给了你：

          lat                  lon          
        first       last     first      last
id                                          
1   41.144540  41.163172 -8.562926 -8.583821
2   41.163233  41.163198 -8.583838 -8.583886

这几乎正是您需要输入到vincenty()来计算每个id的距离的内容

相关问题更多 >

编程相关推荐

热门问题

热门文章