包含可变与不可变值的Numpy数组

5 投票

1 回答

1640 浏览

提问于 2025-04-17 05:43

我想知道在处理数组操作（比如点乘、外积、加法等）时，如何能最快速地忽略数组中的某些值。我主要关注的是那种有些值（可能是30%到50%）被忽略的情况，这些被忽略的值实际上可以看作是零，而且数组的规模比较大，可能有10万到100万的元素。我能想到一些解决方案，但似乎都没有真正利用上忽略某些值的潜在优势。例如：

import numpy as np
A = np.ones((dim, dim)) # the array to modify
B = np.random.random_integers(0, 1, (dim, dim)) # the values to ignore are 0
C = np.array(B, dtype = np.bool)
D = np.random.random((dim, dim)) # the array which will be used to modify A

# Option 1: zero some values using multiplication.
# some initial tests show this is the fastest
A += B * D

# Option 2: use indexing
# this seems to be the slowest
A[C] += D[C]

# Option 3: use masked arrays
A = np.ma.array(np.ones((dim, dim)), mask = np.array(B - 1, dtype = np.bool))
A += D

编辑1：

正如cyborg所建议的，稀疏数组可能是另一个选择。不幸的是，我对这个包不太熟悉，无法获得可能的速度优势。例如，如果我有一个用稀疏矩阵A定义的加权图，且这个图的连接性受到限制，还有另一个稀疏矩阵B来定义连接性（1表示连接，0表示不连接），以及一个密集的numpy矩阵C，我希望能够像这样操作A = A + B.multiply(C)，并利用A和B是稀疏的这一特性。

数据处理 numpy 稀疏矩阵数组操作点乘加法外积大规模计算

1 个回答

稀疏矩阵的好处在于，当数据的密度低于10%时，你可以获得性能上的提升。稀疏矩阵可能会更快，但这也要看你在计算时是否考虑了构建这个矩阵所需的时间。

import timeit

setup=\
'''
import numpy as np
dim=1000
A = np.ones((dim, dim)) # the array to modify
B = np.random.random_integers(0, 1, (dim, dim)) # the values to ignore are 0
C = np.array(B, dtype = np.bool)
D = np.random.random((dim, dim)) # the array which will be used to modify A
'''

print('mult    '+str(timeit.timeit('A += B * D', setup, number=3)))

print('index   '+str(timeit.timeit('A[C] += D[C]', setup, number=3)))

setup3 = setup+\
''' 
A = np.ma.array(np.ones((dim, dim)), mask = np.array(B - 1, dtype = np.bool))
'''
print('ma      ' + str(timeit.timeit('A += D', setup3, number=3)))

setup4 = setup+\
''' 
from scipy import sparse
S = sparse.csr_matrix(C)
DS = S.multiply(D)
'''
print('sparse- '+str(timeit.timeit('A += DS', setup4, number=3)))

setup5 = setup+\
''' 
from scipy import sparse
'''
print('sparse+ '+str(timeit.timeit('S = sparse.csr_matrix(C); DS = S.multiply(D); A += DS', setup4, number=3)))

setup6 = setup+\
'''
from scipy import sparse
class Sparsemat(sparse.coo_matrix):
    def __iadd__(self, other):
        self.data += other.data
        return self
A = Sparsemat(sparse.rand(dim, dim, 0.5, 'coo')) # the array to modify
D = np.random.random((dim, dim)) # the array which will be used to modify A
anz = A.nonzero()
'''
stmt6=\
'''
DS = Sparsemat((D[anz[0],anz[1]], anz), shape=A.shape) # new graph based on random weights
A += DS
'''
print('sparse2 '+str(timeit.timeit(stmt6, setup6, number=3)))

输出：

mult    0.0248420299535
index   0.32025789431
ma      0.1067024434
sparse- 0.00996273276303
sparse+ 0.228869672266
sparse2 0.105496183846

补充：你可以使用上面的代码（setup6）来扩展 scipy.sparse.coo_matrix。这样可以保持稀疏格式。

回答于 2025-04-17 由 Python大师

分享举报

包含可变与不可变值的Numpy数组

1 个回答

撰写回答