我需要把这个代码加速到4毫秒
import numpy as np
def return_call(data):
num = int(data.shape[0] / 4096)
buff_spectrum = np.empty(2048,dtype= np.uint64)
buff_detect = np.empty(2048,dtype= np.uint64)
end_spetrum = np.empty(num*1024,dtype=np.uint64)
end_detect = np.empty(num*1024,dtype= np.uint64)
_data = np.reshape(data,(num,4096))
for _raw_data_spec in _data:
raw_data_spec = np.reshape(_raw_data_spec,(2048,2))
for i in range(2048):
buff_spectrum[i] = (np.int16(raw_data_spec[i][0])<<17)|(np.int16(raw_data_spec[i][1] <<1))>>1
buff_detect[i] = (np.int16(raw_data_spec[i][0])>>15)
for i in range (511,-1,-1):
if buff_spectrum[i+1024] != 0:
end_spetrum[i]=(np.log10(buff_spectrum[i+1024]))
end_detect[i]=buff_detect[i+1024]
else:
end_spetrum[i] =0
end_detect[i] = 0
for i in range(1023, 511, -1):
if buff_spectrum[i+1024] != 0:
end_spetrum[i] = (np.log10(buff_spectrum[i + 1024]))
end_detect[i] = buff_detect[i + 1024]
else:
end_spetrum[i] = 0
end_detect[i] = 0
return end_spetrum, end_detect
我决定用Cython来完成这项任务。但我没有得到任何加速
import numpy as np
cimport numpy
ctypedef signed short DTYPE_t
cpdef return_call(numpy.ndarray[DTYPE_t, ndim=1] data):
cdef int i
cdef int num = data.shape[0]/4096
cdef numpy.ndarray _data
cdef numpy.ndarray[unsigned long long, ndim=1] buff_spectrum = np.empty(2048,dtype= np.uint64)
cdef numpy.ndarray[ unsigned long long, ndim=1] buff_detect = np.empty(2048,dtype= np.uint64)
cdef numpy.ndarray[double , ndim=1] end_spetrum = np.empty(num*1024,dtype= np.double)
cdef numpy.ndarray[double , ndim=1] end_detect = np.empty(num*1024,dtype= np.double)
_data = np.reshape(data,(num,4096))
for _raw_data_spec in _data:
raw_data_spec = np.reshape(_raw_data_spec,(2048,2))
for i in range(2048):
buff_spectrum[i] = (np.uint16(raw_data_spec[i][0])<<17)|(np.uint16(raw_data_spec[i][1] <<1))>>1
buff_detect[i] = (np.uint16(raw_data_spec[i][0])>>15)
for i in range (511,-1,-1):
if buff_spectrum[i+1024] != 0:
end_spetrum[i]=(np.log10(buff_spectrum[i+1024]))
end_detect[i]=buff_detect[i+1024]
else:
end_spetrum[i] =0
end_detect[i] = 0
for i in range(1023, 511, -1):
if buff_spectrum[i+1024] != 0:
end_spetrum[i] = (np.log10(buff_spectrum[i + 1024]))
end_detect[i] = buff_detect[i + 1024]
else:
end_spetrum[i] = 0
end_detect[i] = 0
return end_spetrum, end_detect
我达到的最大速度是80毫秒,但我需要更快。因为您需要几乎实时地处理来自铁的数据 告诉我原因。实现预期结果是否现实。我还附上了测试文件的代码
import numpy as np
import example_original
import example_cython
data = np.empty(8192*2, dtype=np.int16)
import time
startpy = time.time()
example_original.return_call(data)
finpy = time.time() -startpy
startcy = time.time()
k,r = example_cython.return_call(data)
fincy = time.time() -startcy
print( fincy, finpy)
print('Cython is {}x faster'.format(finpy/fincy))
我对Cython没有太多经验,所以这只是一个例子,说明Cython也可以进行计时
示例
计时
我认为这样做的一个主要原因可能是因为您的python代码几乎没有python操作,而所有这些都是numpy操作。numpy代码的很大一部分是用C编写的,其中一些是用Fortran编写的。很多都是用Python编写的。编写良好的numpy代码在速度上与C代码相当
raw_data_spec
未键入。在函数的开头添加一个定义。我建议使用较新的memoryview语法(但如果需要,请使用旧的numpy语法):这条线(您已确定为瓶颈)一团糟:
一步而不是两步编制索引:
raw_data_spec[i, 0]
(注意一个括号和一个逗号)重新考虑转换为16位整数。将16位整数移位17位真的有意义吗
您可能根本不需要强制转换,因为已知数据为
DTYPE_t
,但如果确实需要强制转换,请使用尖括号:<numpy.uint16_t>(raw_data_spec[i, 0])
考虑关闭^ {< CD5>}和^ {< CD6>}。strong>验证自己这样做是安全的,并且在索引超出数组末尾或使用负索引时,不会依赖异常来告诉您。只有在深思熟虑之后才能这样做——而不是自动地以“货物崇拜”的方式
放弃对
np.log10
的调用。这是对单个元素的整个Python调用,结果效率低下。您可以改用C标准库数学函数:然后用
log10
替换np.log10
相关问题 更多 >
编程相关推荐