<p>发帖至今已有3年多的时间,同时也取得了很大的进展。关于此代码(问题的更新2):</p>
<pre><code># cython: infer_types=True
# cython: boundscheck=False
# cython: wraparound=False
import numpy as np
cimport numpy as np
cdef inline inc(np.ndarray[np.int32_t, ndim=2] arr, int i, int j):
arr[i, j]+= 1
def test1(np.ndarray[np.int32_t, ndim=2] arr):
cdef int i,j
for i in xrange(arr.shape[0]):
for j in xrange(arr.shape[1]):
inc(arr, i, j)
def test2(np.ndarray[np.int32_t, ndim=2] arr):
cdef int i,j
for i in xrange(arr.shape[0]):
for j in xrange(arr.shape[1]):
arr[i,j] += 1
</code></pre>
<p>我有以下时间安排:</p>
<pre><code>arr = np.zeros((1000,1000), dtype=np.int32)
%timeit test1(arr)
%timeit test2(arr)
1 loops, best of 3: 354 ms per loop
1000 loops, best of 3: 1.02 ms per loop
</code></pre>
<p>因此,即使超过3年,这个问题还是可以重现的。Cython现在有<a href="http://docs.cython.org/src/userguide/memoryviews.html" rel="noreferrer"><strong>typed memoryviews</strong></a>,因为它是Cython 0.16中引入的,所以在发布问题时不可用。有了这个:</p>
<pre><code># cython: infer_types=True
# cython: boundscheck=False
# cython: wraparound=False
import numpy as np
cimport numpy as np
cdef inline inc(int[:, ::1] tmv, int i, int j):
tmv[i, j]+= 1
def test3(np.ndarray[np.int32_t, ndim=2] arr):
cdef int i,j
cdef int[:, ::1] tmv = arr
for i in xrange(tmv.shape[0]):
for j in xrange(tmv.shape[1]):
inc(tmv, i, j)
def test4(np.ndarray[np.int32_t, ndim=2] arr):
cdef int i,j
cdef int[:, ::1] tmv = arr
for i in xrange(tmv.shape[0]):
for j in xrange(tmv.shape[1]):
tmv[i,j] += 1
</code></pre>
<p>有了这个我得到:</p>
<pre><code>arr = np.zeros((1000,1000), dtype=np.int32)
%timeit test3(arr)
%timeit test4(arr)
1000 loops, best of 3: 977 µs per loop
1000 loops, best of 3: 838 µs per loop
</code></pre>
<p>我们几乎快到那里了,而且已经比老式的方式快了!现在,<code>inc()</code>函数有资格声明<a href="http://docs.cython.org/src/userguide/external_C_code.html#declaring-a-function-as-callable-without-the-gil" rel="noreferrer">^{<cd2>}</a>,所以让我们声明它!但糟糕的是:</p>
<pre><code>Error compiling Cython file:
[...]
cdef inline inc(int[:, ::1] tmv, int i, int j) nogil:
^
[...]
Function with Python return type cannot be declared nogil
</code></pre>
<p>啊,我完全错过了<code>void</code>返回类型的丢失!再一次但是现在用<code>void</code>:</p>
<pre><code>cdef inline void inc(int[:, ::1] tmv, int i, int j) nogil:
tmv[i, j]+= 1
</code></pre>
<p>最后我得到:</p>
<pre><code>%timeit test3(arr)
%timeit test4(arr)
1000 loops, best of 3: 843 µs per loop
1000 loops, best of 3: 853 µs per loop
</code></pre>
<p>和手动内联一样快!</p>
<hr/>
<p>为了好玩,我试过<a href="http://numba.pydata.org/" rel="noreferrer">Numba</a>这段代码:</p>
<pre><code>import numpy as np
from numba import autojit, jit
@autojit
def inc(arr, i, j):
arr[i, j] += 1
@autojit
def test5(arr):
for i in xrange(arr.shape[0]):
for j in xrange(arr.shape[1]):
inc(arr, i, j)
</code></pre>
<p>我得到:</p>
<pre><code>arr = np.zeros((1000,1000), dtype=np.int32)
%timeit test5(arr)
100 loops, best of 3: 4.03 ms per loop
</code></pre>
<p>尽管它比Cython慢4.7倍,很可能是因为JIT编译器未能内联<code>inc()</code>,但我认为它是非常棒的!</strong>我需要做的就是添加<code>@autojit</code>,而不必用笨拙的类型声明来搅乱代码;几乎不需要任何东西就可以加速88x!</p>
<p>我也试过和努玛一起做其他事情,比如</p>
<pre><code>@jit('void(i4[:],i4,i4)')
def inc(arr, i, j):
arr[i, j] += 1
</code></pre>
<p>或<code>nopython=True</code>但未能进一步改善。</p>
<p><a href="https://github.com/numba/numba/issues/160" rel="noreferrer">Improving inlining is on the Numba developers' list</a>,我们只需要提交更多的请求就可以使它具有更高的优先级。;)</p>