为什么“vectorize”优于“frompyfunc”？

import numpy as np def do_double(x): return 2.0*x vectorize = np.vectorize(do_double) frompyfunc = np.frompyfunc(do_double, 1, 1) def wrapped_frompyfunc(arr): return frompyfunc(arr).astype(np.float64)

import numpy as np import perfplot perfplot.show( setup=lambda n: np.linspace(0, 1, n), n_range=[2**k for k in range(20,27)], kernels=[ frompyfunc, vectorize, wrapped_frompyfunc, ], labels=["frompyfunc", "vectorize", "wrapped_frompyfunc"], logx=True, logy=False, xlabel='len(x)', equality_check = None, )

1条回答

网友

1楼 · 发布于 2024-04-26 03:49:09

按照@hpaulj的提示，我们可以分析vectorize-函数：

arr=np.linspace(0,1,10**7)
%load_ext line_profiler

%lprun -f np.vectorize._vectorize_call \
       -f np.vectorize._get_ufunc_and_otypes  \
       -f np.vectorize.__call__  \
       vectorize(arr)

这表明100%的时间都花在_vectorize_call：

Timer unit: 1e-06 s

Total time: 3.53012 s
File: python3.7/site-packages/numpy/lib/function_base.py
Function: __call__ at line 2063

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  2063                                               def __call__(self, *args, **kwargs):
  ...                                         
  2091         1    3530112.0 3530112.0    100.0          return self._vectorize_call(func=func, args=vargs)

...

Total time: 3.38001 s
File: python3.7/site-packages/numpy/lib/function_base.py
Function: _vectorize_call at line 2154

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  2154                                               def _vectorize_call(self, func, args):
  ...
  2161         1         85.0     85.0      0.0              ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)
  2162                                           
  2163                                                       # Convert args to object arrays first
  2164         1          1.0      1.0      0.0              inputs = [array(a, copy=False, subok=True, dtype=object)
  2165         1     117686.0 117686.0      3.5                        for a in args]
  2166                                           
  2167         1    3089595.0 3089595.0     91.4              outputs = ufunc(*inputs)
  2168                                           
  2169         1          4.0      4.0      0.0              if ufunc.nout == 1:
  2170         1     172631.0 172631.0      5.1                  res = array(outputs, copy=False, subok=True, dtype=otypes[0])
  2171                                                       else:
  2172                                                           res = tuple([array(x, copy=False, subok=True, dtype=t)
  2173                                                                        for x, t in zip(outputs, otypes)])
  2174         1          1.0      1.0      0.0          return res

它显示了我在假设中遗漏的部分：双数组完全在预处理步骤中转换为对象数组（在内存方面这样做不是很明智）。其他部分与wrapped_frompyfunc类似：

Timer unit: 1e-06 s

Total time: 3.20055 s
File: <ipython-input-113-66680dac59af>
Function: wrapped_frompyfunc at line 16

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    16                                           def wrapped_frompyfunc(arr):
    17         1    3014961.0 3014961.0     94.2      a = frompyfunc(arr)
    18         1     185587.0 185587.0      5.8      b = a.astype(np.float64)
    19         1          1.0      1.0      0.0      return b

当我们查看峰值内存消耗（例如，通过/usr/bin/time python script.py）时，我们将看到vectorized版本的内存消耗是frompyfunc的两倍，后者使用了更复杂的策略：双数组在大小为^{}（8192）的块中处理，因此此时内存中只存在8192个python浮点（24字节+8字节指针）相同的时间（而不是数组中的元素数，这可能要高得多）。从操作系统中保留内存的成本+更多的缓存未命中可能会导致更高的运行时间。你知道吗

我的收获：

将所有输入转换为对象数组的预处理步骤可能根本不需要，因为frompyfunc有一种更复杂的方法来处理这些转换。你知道吗
当产生的ufunc应该用在“实代码”中时，也不应该使用vectorize或frompyfunc。相反，应该用C或者使用numba/类似的语言来编写它。你知道吗

在对象数组上调用frompyfunc比在双数组上调用frompyfunc所需的时间更少：

arr=np.linspace(0,1,10**7)
a = arr.astype(np.object)
%timeit frompyfunc(arr)  # 1.08 s ± 65.8 ms
%timeit frompyfunc(a)    # 876 ms ± 5.58 ms

但是，上面的测线探查器计时没有显示出在对象上使用ufunc而不是双倍的优势：3.089595s对3014961.0s。我怀疑这是因为在创建所有对象的情况下缓存未命中更多，而在二级缓存中只有8192个创建的对象（256Kb）是热的。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章