我有一个数据集,有150K个GPS坐标条目,如下所示:
log_time latitude longitude
0 1.555840e+09 45.429597 11.974981
1 1.555869e+09 45.429597 11.974981
3 1.555869e+09 45.429596 11.974984
4 1.555869e+09 45.429490 11.975089
5 1.555869e+09 45.429092 11.975478
count 147538
mean 0 days 00:02:27.234798
std 0 days 02:34:54.243149
min 0 days 00:00:00
25% 0 days 00:00:03
50% 0 days 00:00:05
75% 0 days 00:00:08
max 39 days 12:25:39.551000
Name: log_time, dtype: object
我预计这样的数据帧在不久的将来会扩展到几百万条记录,所以可伸缩性是一个优先考虑的问题。你知道吗
我想对运动进行插值,这样对于更大的间隙,至少每60秒有一个GPS记录。你知道吗
标准方法是:
dff = dff.set_index(dff.pop('log_time'))
dff = dff.reindex(np.arange(dff.index.min(), dff.index.max()+1))
产生:
latitude longitude
log_time
1.555840e+09 45.429597 11.974981
1.555840e+09 NaN NaN
1.555840e+09 NaN NaN
1.555840e+09 NaN NaN
1.555840e+09 NaN NaN
那就是用dff.interpolate().reset_index()
之类的东西来插值。你知道吗
然而,我有一个巨大的问题:scipy(以及pandas)提供的插值函数都不适合作为圆弧而不是直线的GPS距离。不过,从I've seen的角度来看,扩展插值函数并不容易
我已经有了我将要使用的距离函数,但是我发现如果不使用嵌套for循环,就很难部署它。你知道吗
from geographiclib.geodesic import Geodesic
geod = Geodesic.WGS84
def custom_interpolation(starting_value, ending_value, number_of_missing_values):
filled_array = [starting_value]
# 1. create a line between starting_value and ending_value
# by solving the inverse geodesic problem
line = geod.InverseLine(starting_value.lat, starting_value.lon, ending_value.lat, ending_value.long)
# 2. Determine the length of the steps needed to fill
# the missing values between the two extremes;
# s13 is the total arc length of the line
step_lenght = line.s13 / number_of_missing_values
# 3. Add mid values between the two arrays
for i in range(1, n + 1):
distance = min(step_lenght * i, line.s13)
g = line.Position(distance, Geodesic.STANDARD | Geodesic.LONG_UNROLL)
filled_array.append(g['lat2'], g['lon2'])
filled_array.append(ending_value)
return filled_array
所以像[(LAT1, LON1), None, None, None, (LAT2, LON2)]
这样的东西可以变成[(LAT1, LON1), (LAT, LON), (LAT, LON), (LAT2, LON2)]
。你知道吗
目前没有回答
相关问题 更多 >
编程相关推荐