我正在处理时间序列数据(卫星图像),希望用最接近的非缺失值填充缺失值(例如,云)。我已经找到了非常有用的帖子Most efficient way to forward-fill NaN values in numpy array,它回答了许多问题。阵列的前向和后向填充效果非常好,速度也非常快。现在,我想将这两种方法合并为一种方法,其中只选择“最近的”值
import numpy as np
def np_ffill(arr, axis):
idx_shape = tuple([slice(None)] + [np.newaxis] * (len(arr.shape) - axis - 1))
fwd_idx = np.where(~np.isnan(arr), np.arange(arr.shape[axis])[idx_shape], 0)
fwd_idx = np.maximum.accumulate(fwd_idx, axis=axis)
slc = [np.arange(k)[tuple([slice(None) if dim==i else np.newaxis
for dim in range(len(arr.shape))])]
for i, k in enumerate(arr.shape)]
slc[axis] = fwd_idx
return arr[tuple(slc)]
def np_bfill(arr, axis):
idx_shape = tuple([slice(None)] + [np.newaxis] * (len(arr.shape) - axis - 1))
bwd_idx = np.where(~np.isnan(arr), np.arange(arr.shape[axis])[idx_shape], arr.shape[axis] - 1)
bwd_idx = np.minimum.accumulate(bwd_idx[:,:,::-1], axis=axis)[:,:,::-1]
slc = [np.arange(k)[tuple([slice(None) if dim==i else np.newaxis
for dim in range(len(arr.shape))])]
for i, k in enumerate(arr.shape)]
slc[axis] = bwd_idx
return arr[tuple(slc)]
def random_array(shape):
choices = [1, 2, 3, 4, np.nan]
out = np.random.choice(choices, size=shape)
return out
ra = random_array((10, 10, 5)) # for testing, I assume 5 images with the size of 10x10 pixels
ffill = np_ffill(ra,2) # the filling should only be applied on the last axis (2)
bfill = np_bfill(ra,2)
到目前为止,我唯一的想法是比较指数fwd_idx
和bwd_idx
,以确定哪个位置更接近要填补的位置。然而,这将意味着再次创建FOR循环。这不是也有一个矢量化的numpy方法吗?非常感谢你的帮助
目前没有回答
相关问题 更多 >
编程相关推荐