我想计算一个大型2D矩阵的滚动分位数,其维度(1e6,1e5),按列计算。我正在寻找最快的方法,因为我需要执行此操作数千次,而且计算成本非常高。对于实验,使用窗口=1000和q=0.1
import numpy as np
import pandas as pd
import multiprocessing as mp
from functools import partial
import numba as nb
X = np.random.random((10000,1000)) # Original array has dimensions of about (1e6, 1e5)
我目前的做法:
熊猫:%timeit: 5.8 s ± 15.5 ms per loop
def pd_rolling_quantile(X, window, q):
return pd.DataFrame(X).rolling(window).quantile(quantile=q)
努比迈着大步:%timeit: 2min 42s ± 3.29 s per loop
def strided_app(a, L, S):
nrows = ((a.size-L)//S)+1
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))
def np_1d(x, window, q):
return np.pad(np.percentile(strided_app(x, window, 1), q*100, axis=-1), (window-1, 0) , mode='constant')
def np_rolling_quantile(X, window, q):
results = []
for i in np.arange(X.shape[1]):
results.append(np_1d(X[:,i], window, q))
return np.column_stack(results)
多处理:%timeit: 1.13 s ± 27.6 ms per loop
def mp_rolling_quantile(X, window, q):
pool = mp.Pool(processes=12)
results = pool.map(partial(pd_rolling_quantile, window=window, q=q), [X[:,i] for i in np.arange(X.shape[1])])
pool.close()
pool.join()
return np.column_stack(results)
麻木:%timeit: 2min 28s ± 182 ms per loop
@nb.njit
def nb_1d(x, window, q):
out = np.zeros(x.shape[0])
for i in np.arange(x.shape[0]-window+1)+window:
out[i-1] = np.quantile(x[i-window:i], q=q)
return out
def nb_rolling_quantile(X, window, q):
results = []
for i in np.arange(X.shape[1]):
results.append(nb_1d(X[:,i], window, q))
return np.column_stack(results)
计时不是很好,理想情况下,我的目标是速度提高10-50倍。如果有任何建议,我将不胜感激,如何加快速度。也许有人有使用低级语言(Cython)的想法,或者用基于Numpy/Numba/Tensorflow的方法来加速它。谢谢
我想推荐新的
rolling-quantiles
{a1}。 为了证明这一点,即使是为每一列构建单独过滤器的有点幼稚的方法也优于上述单线程pandas
实验:对
如您所示,这两种方法都可以通过
multiprocessing
进行简单的并行化相关问题 更多 >
编程相关推荐