如何插值Pandas的GPS坐标?

2024-03-29 08:16:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据集,有150K个GPS坐标条目,如下所示:

    log_time    latitude    longitude
0   1.555840e+09    45.429597   11.974981
1   1.555869e+09    45.429597   11.974981
3   1.555869e+09    45.429596   11.974984
4   1.555869e+09    45.429490   11.975089
5   1.555869e+09    45.429092   11.975478
count                     147538
mean      0 days 00:02:27.234798
std       0 days 02:34:54.243149
min              0 days 00:00:00
25%              0 days 00:00:03
50%              0 days 00:00:05
75%              0 days 00:00:08
max      39 days 12:25:39.551000
Name: log_time, dtype: object

我预计这样的数据帧在不久的将来会扩展到几百万条记录,所以可伸缩性是一个优先考虑的问题。你知道吗

我想对运动进行插值,这样对于更大的间隙,至少每60秒有一个GPS记录。你知道吗

标准方法是:

dff = dff.set_index(dff.pop('log_time'))
dff = dff.reindex(np.arange(dff.index.min(), dff.index.max()+1))

产生:

latitude    longitude
log_time        
1.555840e+09    45.429597   11.974981
1.555840e+09    NaN NaN
1.555840e+09    NaN NaN
1.555840e+09    NaN NaN
1.555840e+09    NaN NaN

那就是用dff.interpolate().reset_index()之类的东西来插值。你知道吗

然而,我有一个巨大的问题:scipy(以及pandas)提供的插值函数都不适合作为圆弧而不是直线的GPS距离。不过,从I've seen的角度来看,扩展插值函数并不容易

我已经有了我将要使用的距离函数,但是我发现如果不使用嵌套for循环,就很难部署它。你知道吗

from geographiclib.geodesic import Geodesic
geod = Geodesic.WGS84

def custom_interpolation(starting_value, ending_value, number_of_missing_values):
    filled_array = [starting_value]

    # 1. create a line between starting_value and ending_value 
    # by solving the inverse geodesic problem
    line = geod.InverseLine(starting_value.lat, starting_value.lon, ending_value.lat, ending_value.long)

    # 2. Determine the length of the steps needed to fill 
    # the missing values between the two extremes; 
    # s13 is the total arc length of the line
    step_lenght = line.s13 / number_of_missing_values

    # 3. Add mid values between the two arrays
    for i in range(1, n + 1):
        distance = min(step_lenght * i, line.s13)
        g = line.Position(distance, Geodesic.STANDARD | Geodesic.LONG_UNROLL)
        filled_array.append(g['lat2'], g['lon2'])

    filled_array.append(ending_value)
    return filled_array

所以像[(LAT1, LON1), None, None, None, (LAT2, LON2)]这样的东西可以变成[(LAT1, LON1), (LAT, LON), (LAT, LON), (LAT2, LON2)]。你知道吗


Tags: ofthelogindextimevalueendingline