Cython中的后缀计算器

2024-05-08 02:34:44 发布

您现在位置:Python中文网/ 问答频道 /正文

是的。你知道吗

你好。你知道吗

我正在尝试用Cython开发一个后缀计算器,它是从一个正常工作的Numpy版本翻译过来的。这是我第一次尝试。计算器函数获取列表中的后缀表达式和示例矩阵。然后,它必须返回计算出的数组。你知道吗

输入示例:

postfix = ['X0', 'X1', 'add']
samples = [[0, 1], 
           [2, 3], 
           [4, 5]]
result = [1, 5, 9]

示例_cython.pyx公司你知道吗

#cython: boundscheck=False, wraparound=False, nonecheck=False

import numpy
from libc.math cimport sin as c_sin

cdef inline calculate(list lst, double [:,:] samples):
    cdef int N = samples.shape[0]
    cdef int i, j
    cdef list stack = []
    cdef double[:] Y = numpy.zeros(N)

    for p in lst:
        if p == 'add':
            b = stack.pop()
            a = stack.pop()
            for i in range(N):
                Y[i] = a[i] + b[i]
            stack.append(Y)
        elif p == 'sub':
            b = stack.pop()
            a = stack.pop()
            for i in range(N):
                Y[i] = a[i] - b[i]
            stack.append(Y)
        elif p == 'mul':
            b = stack.pop()
            a = stack.pop()
            for i in range(N):
                Y[i] = a[i] * b[i]
            stack.append(Y)
        elif p == 'div':
            b = stack.pop()
            a = stack.pop()
            for i in range(N):
                if abs(b[i]) < 1e-4: b[i]=1e-4
                Y[i] = a[i] / b[i]
            stack.append(Y)
        elif p == 'sin':
            a = stack.pop()
            for i in range(N):
                Y[i] = c_sin(a[i])
            stack.append(Y)
        else:
            if p[0] == 'X':
                j = int(p[1:])
                stack.append (samples[:, j])
            else:
                stack.append(float(p))
    return stack.pop ()


# Generate and evaluate expressions
cpdef test3(double [:,:] samples, object _opchars, object _inputs, int nExpr):
    for i in range(nExpr):
        size = 2
        postfix = list(numpy.concatenate((numpy.random.choice(_inputs, 5*size),
                                        numpy.random.choice(_inputs + _opchars, size),
                                        numpy.random.choice(_opchars, size)), 0))
        #print postfix

        res = calculate(postfix, samples)

你知道吗主.py你知道吗

import random
import time
import numpy
from example_cython import test3

# Random dataset
n = 1030
nDim=10
samples = numpy.random.uniform(size=(n, nDim))

_inputs = ['X'+str(i) for i in range(nDim)]
_ops_1 = ['sin']
_ops_2 = ['add', 'sub', 'mul', 'div']
_opchars = _ops_1 + _ops_2
nExpr = 1000
nTrials = 3

tic = time.time ()
for i in range(nTrials): test3(samples, _opchars, _inputs, nExpr)
print ("TEST 1: It took an average of {} seconds to evaluate {} expressions on a dataset of {} rows and {} columns.".format(str((time.time () - tic)/nTrials), str(nExpr), str(n), str(nDim)))

你知道吗设置.py你知道吗

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_modules=[ Extension("example_cython",
              ["example_cython.pyx"],
              libraries=["m"],
              extra_compile_args = ["-Ofast", "-ffast-math"])]

setup(
  name = "example_cython",
  cmdclass = {"build_ext": build_ext},
  ext_modules = ext_modules)

配置:

Python 3.6.2 |Anaconda, Inc.| (default, Sep 21 2017, 18:29:43)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin

>>> numpy.__version__
'1.13.1'
>>> cython.__version__
'0.26.1'

编译和运行:

running build_ext
skipping 'example_cython.c' Cython extension (up-to-date)
building 'example_cython' extension
/usr/bin/clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -fwrapv -O2 -Wall -Wstrict-prototypes -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -I/Users/vmelo/anaconda3/include/python3.6m -c example_cython.c -o build/temp.macosx-10.9-x86_64-3.6/example_cython.o -Ofast -ffast-math
example_cython.c:2506:15: warning: code will never be executed [-Wunreachable-code]
    if (0 && (__pyx_tmp_idx < 0 || __pyx_tmp_idx >= __pyx_tmp_shape)) {
              ^~~~~~~~~~~~~
example_cython.c:2506:9: note: silence by adding parentheses to mark code as explicitly dead
    if (0 && (__pyx_tmp_idx < 0 || __pyx_tmp_idx >= __pyx_tmp_shape)) {
        ^
        /* DISABLES CODE */ ( )
example_cython.c:2505:9: warning: code will never be executed [-Wunreachable-code]
        __pyx_tmp_idx += __pyx_tmp_shape;
        ^~~~~~~~~~~~~
example_cython.c:2504:9: note: silence by adding parentheses to mark code as explicitly dead
    if (0 && (__pyx_tmp_idx < 0))
        ^
        /* DISABLES CODE */ ( )
2 warnings generated.
/usr/bin/clang -bundle -undefined dynamic_lookup -Wl,-pie -Wl,-headerpad_max_install_names -Wl,-rpath,/Users/vmelo/anaconda3/lib -L/Users/vmelo/anaconda3/lib -Wl,-pie -Wl,-headerpad_max_install_names -Wl,-rpath,/Users/vmelo/anaconda3/lib -L/Users/vmelo/anaconda3/lib -arch x86_64 build/temp.macosx-10.9-x86_64-3.6/example_cython.o -L/Users/vmelo/anaconda3/lib -lm -o /Users/vmelo/Dropbox/SRC/python/random_equation/cython_v2/example_cython.cpython-36m-darwin.so
ld: warning: -pie being ignored. It is only used when linking a main executable

TEST 1: It took an average of 1.2609198093414307 seconds to evaluate 1000 expressions on a dataset of 1030 rows and 10 columns.

在我的i5 1.4Ghz上运行大约需要1,25秒。但是,类似的C代码需要0,13秒。你知道吗

上面的代码计算1000个表达式,但我的目标是1000000个。因此,我必须大大加快这个Cython代码的速度。你知道吗

正如我在开头所写的,Numpy版本工作正常。 也许,在这个Cython版本中,我不应该使用列表作为堆栈?我仍然没有检查这个Cython代码生成的结果是否正确,因为我专注于提高它的速度。你知道吗

有什么建议吗?你知道吗

谢谢。你知道吗


Tags: inimportnumpyforstackexamplecoderange
1条回答
网友
1楼 · 发布于 2024-05-08 02:34:44

目前唯一优化的操作是索引samples[:, j]。(几乎)其他一切都是非类型化的,因此Cython无法对其进行太多优化。你知道吗

我真的不想完全重写你的(相当大的)程序,但这里有一些关于如何改进它的简单想法。你知道吗

  1. 修复一个基本的逻辑错误-您需要在循环中使用Y = numpy.zeros(N)行。stack.append(Y)不会生成Y的副本,因此每次修改Y时,也会修改堆栈上的所有其他版本。

  2. ab设置类型:

    cdef double[:] a, b # at the start of the program
    

    这将大大加快索引的速度

    Y[i] = a[i] * b[i]
    

    但是,它会导致像a = stack.pop()这样的行稍微慢一点,因为它需要检查结果是否可以用作memoryview。您还需要更改线路

    stack.append(float(p))
    

    要确保在堆栈上放置具有memoryview的可用对象,请执行以下操作:

    stack.append(float(p)*np.ones(N))
    
  3. 将堆栈更改为二维内存视图。我建议您过度分配它,只需保持number_on_stack的计数,然后根据需要重新分配堆栈。然后可以更改:

    stack.append(samples[:, j])
    

    收件人:

    if stack.shape[1] < number_on_stack+1:
        # resize numpy array
        arr = stack.base
        arr.resize(... larger shape ..., refcheck=False)
        stack = arr # re-link stack to resized array (to ensure size is suitably updated)
    stack[:,number_on_stack+1] = samples[:,j]
    number_on_stack += 1
    

    以及

    a = stack.pop()
    

    a = stack[:,number_on_stack]
    number_on_stack -= 1
    

    其他变化也遵循类似的模式。这个选项是大多数工作,但可能得到最好的结果。


使用cython -a生成彩色HTML可以让您合理地了解哪些位得到了很好的优化(黄色代码通常更糟)

相关问题 更多 >