numpy:高效地添加矩阵的行

2024-04-19 05:36:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个矩阵。在

mat = array([
   [ 0,  1,  2,  3],
   [ 4,  5,  6,  7],
   [ 8,  9, 10, 11]
   ])

我想得到某些索引下的行的总和:例如

^{pr2}$

我知道我可以计算出答案:

^{3}$

问题是ixs可能很长,我不想使用所有内存来创建中间产品mat[ixs],只想用和来减少它。在

我也知道我可以简单地计算指数,然后用乘法来代替。在

np.bincount(ixs, minlength=mat.shape[0).dot(mat)
> array([16, 23, 30, 37])

但是如果我的IX稀疏的话那就太贵了。在

我知道scipy的稀疏矩阵,我想我可以使用它们,但我更喜欢纯numpy解决方案,因为稀疏矩阵在很多方面都受到限制(例如仅为二维)

那么,在这种情况下,有没有一种纯粹的方法来合并索引和和归约呢?在

结论:

感谢Divakar和hpaulj提供的非常全面的答复。我所说的“稀疏”是指range(w.shape[0])中的大多数值不在ixs中。使用这个新定义(以及更真实的数据大小,我重新运行了Divakar测试,添加了一些新功能:

rng = np.random.RandomState(1234)
mat = rng.randn(1000, 500)
ixs = rng.choice(rng.randint(mat.shape[0], size=mat.shape[0]/10), size=1000)

# Divakar's solutions
In[42]: %timeit org_indexing_app(mat, ixs)
1000 loops, best of 3: 1.82 ms per loop
In[43]: %timeit org_bincount_app(mat, ixs)
The slowest run took 4.07 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 177 µs per loop
In[44]: %timeit indexing_modified_app(mat, ixs)
1000 loops, best of 3: 1.81 ms per loop
In[45]: %timeit bincount_modified_app(mat, ixs)
1000 loops, best of 3: 258 µs per loop
In[46]: %timeit simply_indexing_app(mat, ixs)
1000 loops, best of 3: 1.86 ms per loop
In[47]: %timeit take_app(mat, ixs)
1000 loops, best of 3: 1.82 ms per loop
In[48]: %timeit unq_mask_einsum_app(mat, ixs)
10 loops, best of 3: 58.2 ms per loop 
# hpaulj's solutions
In[53]: %timeit hpauljs_sparse_solution(mat, ixs)
The slowest run took 9.34 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 524 µs per loop
%timeit hpauljs_second_sparse_solution(mat, ixs)
100 loops, best of 3: 9.91 ms per loop
# Sparse version of original bincount solution (see below):
In[60]: %timeit sparse_bincount(mat, ixs)
10000 loops, best of 3: 71.7 µs per loop

在本例中,赢家是bincount解决方案的稀疏版本。在

def sparse_bincount(mat, ixs):
    x = np.bincount(ixs)
    nonzeros, = np.nonzero(x)
    x[nonzeros].dot(mat[nonzeros])

Tags: ofinloopappnpmsbestshape
3条回答

经过大量的数字运算(见原始问题的结论),当输入定义如下时,表现最好的答案:

rng = np.random.RandomState(1234)
mat = rng.randn(1000, 500)
ixs = rng.choice(rng.randint(mat.shape[0], size=mat.shape[0]/10), size=1000)

似乎是:

^{pr2}$

因为我们假设ixs可能是sparsey,我们可以修改策略,根据给定的行索引分别从zero-th行和其余行中获得行的总和。因此,我们可以使用bincount方法进行non-zero-th索引行的求和,并在ixs中添加{})。在

因此,可以修改第二种方法,如-

nzmask = ixs!=0
nzsum = np.bincount(ixs[nzmask]-1, minlength=mat.shape[0]-1).dot(mat[1:])
row0_sum = mat[0]*(len(ixs) - np.count_nonzero(nzmask))
out = nzsum + row0_sum

我们也可以把这个策略扩展到第一种方法,比如-

^{pr2}$

如果我们使用大量重复的非零索引,那么我们可以选择使用np.take来关注性能。因此,mat[ixs[nzidx]]可以被np.take(mat,ixs[nzidx],axis=0)替换,类似地,mat[ixs]可以被{}代替。在这种重复索引的情况下,np.take与简单的索引相比有一些显著的加速。在

最后,我们可以使用^{}来执行这些基于行ID的选择和求和,如下-

nzmask = ixs!=0
unq,tags = np.unique(ixs[nzmask],return_inverse=1)
nzsum = np.einsum('ji,jk->k',np.arange(len(unq))[:,None] == tags,mat[unq])
out = mat[0]*(len(ixs) - np.count_nonzero(nzmask)) + nzsum

标杆管理

让我们列出到目前为止在这篇文章中发布的五种方法,并将问题中发布的两种方法作为函数进行一些运行时测试-

def org_indexing_app(mat,ixs):
    return mat[ixs].sum(axis=0)

def org_bincount_app(mat,ixs):
    return np.bincount(ixs, minlength=mat.shape[0]).dot(mat)

def indexing_modified_app(mat,ixs):
    return np.take(mat,ixs,axis=0).sum(axis=0)

def bincount_modified_app(mat,ixs):
    nzmask = ixs!=0
    nzsum = np.bincount(ixs[nzmask]-1, minlength=mat.shape[0]-1).dot(mat[1:])
    row0_sum = mat[0]*(len(ixs) - np.count_nonzero(nzmask))
    return nzsum + row0_sum

def simply_indexing_app(mat,ixs):
    nzmask = ixs!=0
    nzsum = mat[ixs[nzmask]].sum(axis=0)
    return mat[0]*(len(ixs) - np.count_nonzero(nzmask)) + nzsum

def take_app(mat,ixs):
    nzmask = ixs!=0
    nzsum = np.take(mat,ixs[nzmask],axis=0).sum(axis=0)
    return mat[0]*(len(ixs) - np.count_nonzero(nzmask)) + nzsum

def unq_mask_einsum_app(mat,ixs):
    nzmask = ixs!=0
    unq,tags = np.unique(ixs[nzmask],return_inverse=1)
    nzsum = np.einsum('ji,jk->k',np.arange(len(unq))[:,None] == tags,mat[unq])
    return mat[0]*(len(ixs) - np.count_nonzero(nzmask)) + nzsum

计时

案例1(ixs为95%sparsey):

In [301]: # Setup input
     ...: mat = np.random.rand(20,4)
     ...: ixs = np.random.randint(0,10,(100000))
     ...: ixs[np.random.rand(ixs.size)<0.95] = 0 # Make it approx 95% sparsey
     ...: 

In [302]: # Timings
     ...: %timeit org_indexing_app(mat,ixs)
     ...: %timeit org_bincount_app(mat,ixs)
     ...: %timeit indexing_modified_app(mat,ixs)
     ...: %timeit bincount_modified_app(mat,ixs)
     ...: %timeit simply_indexing_app(mat,ixs)
     ...: %timeit take_app(mat,ixs)
     ...: %timeit unq_mask_einsum_app(mat,ixs)
     ...: 
100 loops, best of 3: 4.89 ms per loop
1000 loops, best of 3: 428 µs per loop
100 loops, best of 3: 3.29 ms per loop
1000 loops, best of 3: 329 µs per loop
1000 loops, best of 3: 537 µs per loop
1000 loops, best of 3: 462 µs per loop
1000 loops, best of 3: 1.07 ms per loop

案例2(ixs为98%sparsey):

In [303]: # Setup input
     ...: mat = np.random.rand(20,4)
     ...: ixs = np.random.randint(0,10,(100000))
     ...: ixs[np.random.rand(ixs.size)<0.98] = 0 # Make it approx 98% sparsey
     ...: 

In [304]: # Timings
     ...: %timeit org_indexing_app(mat,ixs)
     ...: %timeit org_bincount_app(mat,ixs)
     ...: %timeit indexing_modified_app(mat,ixs)
     ...: %timeit bincount_modified_app(mat,ixs)
     ...: %timeit simply_indexing_app(mat,ixs)
     ...: %timeit take_app(mat,ixs)
     ...: %timeit unq_mask_einsum_app(mat,ixs)
     ...: 
100 loops, best of 3: 4.86 ms per loop
1000 loops, best of 3: 438 µs per loop
100 loops, best of 3: 3.5 ms per loop
1000 loops, best of 3: 260 µs per loop
1000 loops, best of 3: 318 µs per loop
1000 loops, best of 3: 288 µs per loop
1000 loops, best of 3: 694 µs per loop

bincount的另一种选择是add.at

In [193]: mat
Out[193]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [194]: ixs
Out[194]: array([0, 2, 0, 0, 0, 1, 1])

In [195]: J = np.zeros(mat.shape[0],int)
In [196]: np.add.at(J, ixs, 1)
In [197]: J
Out[197]: array([4, 2, 1])

In [198]: np.dot(J, mat)
Out[198]: array([16, 23, 30, 37])

稀疏性是指,我假设,ixs可能不包括所有行,例如,ixs没有0:

^{pr2}$

J仍然具有mat.shape[0]形状。但是add.at应该按ixs的长度缩放。在

稀疏解决方案看起来像:

ixs生成一个稀疏矩阵,如下所示:

In [204]: I
Out[204]: 
array([[1, 0, 1, 1, 1, 0, 0],
       [0, 0, 0, 0, 0, 1, 1],
       [0, 1, 0, 0, 0, 0, 0]])

对行求和;sparse使用矩阵乘法进行运算,如:

In [205]: np.dot(I, np.ones((7,),int))
Out[205]: array([4, 2, 1])

那就做我们的点:

In [206]: np.dot(np.dot(I, np.ones((7,),int)), mat)
Out[206]: array([16, 23, 30, 37])

或者在稀疏代码中:

In [225]: J = sparse.coo_matrix((np.ones_like(ixs,int),(np.arange(ixs.shape[0]), ixs)))
In [226]: J.A
Out[226]: 
array([[1, 0, 0],
       [0, 0, 1],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [0, 1, 0],
       [0, 1, 0]])
In [227]: J.sum(axis=0)*mat
Out[227]: matrix([[16, 23, 30, 37]])

sparse,当从coo转换为csr时,求和重复。我可以利用它

In [229]: J = sparse.coo_matrix((np.ones_like(ixs,int), (np.zeros_like(ixs,int), ixs)))
In [230]: J
Out[230]: 
<1x3 sparse matrix of type '<class 'numpy.int32'>'
    with 7 stored elements in COOrdinate format>
In [231]: J.A
Out[231]: array([[4, 2, 1]])
In [232]: J*mat
Out[232]: array([[16, 23, 30, 37]], dtype=int32)

相关问题 更多 >