Numpy einsumèu路径报告更多失败和“减速”

2024-06-10 20:10:37 发布

您现在位置:Python中文网/ 问答频道 /正文

关于主题np.einsum,我已经在以下网站上阅读了一系列讨论:

为了进一步了解为什么np.eimsum比通常的np.sumnp.product等更快(甚至对于anaconda中最新的numpy版本),我使用np.einsum_path来查看优化过程优化了什么。你知道吗

这样做的时候,我发现了一个有趣的现象。考虑这个最小的例子:

import numpy as np
for i in 'int8 int16 int32 int64 uint8 uint16 uint32 uint64 float32 float64'.split():
    print(np.einsum_path('i->', np.empty(2**30, i))[1])

输出全部相同:

  Complete contraction:  i->
         Naive scaling:  1
     Optimized scaling:  1
      Naive FLOP count:  1.074e+09
  Optimized FLOP count:  2.147e+09
   Theoretical speedup:  0.500
  Largest intermediate:  1.000e+00 elements
--------------------------------------------------------------------------
scaling                  current                                remaining
--------------------------------------------------------------------------
   1                         i->                                       ->

其中优化的触发器增加(这意味着更多的计算?)理论加速小于1(即速度较慢)。但如果我们真的计算时间:

for i in 'int8 int16 int32 int64 uint8 uint16 uint32 uint64 float32 float64'.split():
    a    = np.empty(2**27, i)
    raw  = %timeit -qon9 a.sum()
    noOp = %timeit -qon9 np.einsum('i->', a, optimize=False)
    op   = %timeit -qon9 np.einsum('i->', a, optimize='greedy')
    print(i, raw.average/op.average, noOp.average/op.average, sep='\t')

如果我们看对应于“本机”计时除以优化计时的第二列,它们都接近1,这意味着优化并没有使它变慢:

int8    4.485133392283354   1.0205873691331475
int16   3.7817373109729213  0.9528030137222752
int32   1.3760725925789292  1.0741615462167338
int64   1.0793509548186524  1.0076602576129605
uint8   4.509893894635594   0.997277624256872
uint16  3.964949791428885   0.9914991211913878
uint32  1.3054813163356085  1.009475242303559
uint64  1.0747670688044795  1.0082522386805526
float32 2.4105510701565636  0.9998241152368149
float64 2.1957241421227556  0.9836838487664662

我想知道伤口会np.einsum_path说它需要更多的失败和它的速度慢?我相信计时是直接从失败次数计算出来的,所以这两个基准基本上是指同一件事。你知道吗

顺便说一句,我附上了一个例子,展示了np.einsum_path“通常”是如何让上述结果变得异常的:

a = np.empty((64, 64))
print(np.einsum_path('ij,jk,kl->il', a, a, a)[1])
noOp = %timeit -qon99 np.einsum('ij,jk,kl->il', a, a, a, optimize=False)
op   = %timeit -qon99 np.einsum('ij,jk,kl->il', a, a, a, optimize='greedy')
print('Actual speedup:', noOp.average / op.average)

输出:

  Complete contraction:  ij,jk,kl->il
         Naive scaling:  4
     Optimized scaling:  3
      Naive FLOP count:  5.033e+07
  Optimized FLOP count:  1.049e+06
   Theoretical speedup:  48.000
  Largest intermediate:  4.096e+03 elements
--------------------------------------------------------------------------
scaling                  current                                remaining
--------------------------------------------------------------------------
   3                   jk,ij->ik                                kl,ik->il
   3                   ik,kl->il                                   il->il
Actual speed up: 90.33518444642904

Tags: pathnumpynpilprintaverageopjk
1条回答
网友
1楼 · 发布于 2024-06-10 20:10:37

我刚刚深入研究了np.einsum_path的源代码。根据这里的评论(即here):

# Compute naive cost
# This isn't quite right, need to look into exactly how einsum does this

以及计算最佳成本的方式(即here,不过帐,太长)。似乎两种成本的计算方法是一致的,而第一种成本的计算方法被证明是“不完全正确的”。你知道吗

然后我打印了“本机”(即未优化)einsum\u路径:

import numpy as np
print(np.einsum_path('i->', np.empty(2**30, 'b'), optimize=False)[1])

令人惊讶的是“与本地人不同”:

  Complete contraction:  i->
         Naive scaling:  1
     Optimized scaling:  1
      Naive FLOP count:  1.074e+09
  Optimized FLOP count:  2.147e+09
   Theoretical speedup:  0.500
  Largest intermediate:  1.000e+00 elements
                                     
scaling                  current                                remaining
                                     
   1                         i->                                       ->

因此,报告的速度下降的原因仅仅是flop计数错误。np.einsum实际上没有做任何路径优化(不管出于什么原因,它基本上还是比原生的np.sum更快)。你知道吗

相关问题 更多 >