使用多个核心进行Numpy np.einsum 数组乘法

6 投票

1 回答

2533 浏览

提问于 2025-04-18 06:30

我编译了numpy 1.6.2和scipy，并且使用了MKL，希望能提高性能。目前我有一段代码非常依赖于np.einsum()，但我听说einsum在MKL下表现不好，因为几乎没有向量化的支持。=（所以我在考虑用np.dot()和切片来重写我的部分代码，以便能够利用多核加速。我很喜欢np.einsum()的简单性，代码也比较易读。

比如说，我有一个多维矩阵相乘的形式：

np.einsum('mi,mnijqk->njqk',A,B)

那么我该如何将这样的操作，或者其他三维、四维和五维数组的乘法转换成np.dot()的高效MKL操作呢？

我会补充更多信息：我正在计算这个方程：

enter image description here

为此，我使用的代码是：

np.einsum('mn,mni,nij,nik,mi->njk',a,np.exp(b[:,:,np.newaxis]*U[np.newaxis,:,:]),P,P,X)

这个速度并不是很快，用cython写的同样代码快了5倍：

    #STACKOVERFLOW QUESTION:
from __future__ import division
import numpy as np
cimport numpy as np
cimport cython

cdef extern from "math.h":
    double exp(double x)


DTYPE = np.float

ctypedef np.float_t DTYPE_t
@cython.boundscheck(False) # turn of bounds-checking for entire function
def cython_DX_h(np.ndarray[DTYPE_t, ndim=3] P, np.ndarray[DTYPE_t, ndim=1] a, np.ndarray[DTYPE_t, ndim=1] b, np.ndarray[DTYPE_t, ndim=2] U,  np.ndarray[DTYPE_t, ndim=2] X, int I, int M):
    assert P.dtype == DTYPE and a.dtype == DTYPE and b.dtype == DTYPE and U.dtype == DTYPE and X.dtype == DTYPE

cdef np.ndarray[DTYPE_t,ndim=3] DX_h=np.zeros((N,I,I),dtype=DTYPE)
cdef unsigned int j,n,k,m,i
for n in range(N):
    for j in range(I):
        for k in range(I):
            aux=0
            for m in range(N):
                for i in range(I):
                    aux+=a[m,n]*exp(b[m,n]*U[n,i])*P[n,i,j]*P[n,i,k]*X[m,i]
            DX_h[n,j,k]=aux
return DX_h

有没有办法在纯Python中实现与cython相同的性能？（我还没能弄明白如何用tensordot来处理这个方程）在这个cython代码中也没法使用prange，遇到了很多gil和nogil的错误。

性能优化 numpy cython einsum tensordot 数组乘法 mkl 多核加速

1 个回答

另外，你可以使用 numpy.tensordot()：

np.tensordot(A, B, axes=[[0, 1], [0, 2]])

这个方法也会利用多个处理器核心，和 numpy.dot() 一样。

回答于 2025-04-18 由 Python大师

分享举报

使用多个核心进行Numpy np.einsum 数组乘法

1 个回答

撰写回答