数字类型的数学

2024-05-29 03:20:04 发布

您现在位置:Python中文网/ 问答频道 /正文

似乎使用numpy数据类型(特别是uint32)进行计算比在常规python int上进行计算要花费更长的时间。下面是我的实际示例代码:

import numpy

## Binary encoding of DNA as python int
bDic = {'A': 0 ,'C': 1 ,'G': 2 ,'T': 3  } # DNA to 32bit binary...
tDic = ['A',    'C',    'G',    'T'     ] # ...and back again :)
range32 = range(0,32,2)

def string_up2bit(string):
    up2bit = 3
    for char in reversed(string): up2bit = (up2bit << 2) + bDic[char]
    return up2bit
def up2bit_string(value):
    up2bits = [((value >> x) & 3) for x in range32]
    return ''.join([tDic[up2bit] for up2bit in up2bits[:-up2bits[::-1].index(3)-1]])

## Binary encoding of DNA as numpy uint32 (what i will actually be saving to disk)
n0,n1,n2,n3 = numpy.uint32(0),numpy.uint32(1),numpy.uint32(2),numpy.uint32(3)
npbDic = { 'A': n0 ,'C': n1 ,'G': n2 ,'T': n3 } # DNA to 32bit binary...
nptDic = { n0 :'A', n1 :'C', n2 :'G', n3 :'T' } # ...and back again :)
nprange32 = list(numpy.arange(0,32,2,dtype='uint32'))

def np_string_up2bit(string):
    up2bit = n3
    for char in reversed(string): up2bit = (up2bit << n2) + npbDic[char]
    return up2bit
def np_up2bit_string(value):
    up2bits = [((value >> x) & n3) for x in nprange32] # The 32 here makes it 32bit only.
    return ''.join([nptDic[up2bit] for up2bit in up2bits[:-up2bits[::-1].index(n3)-1]])

## Begin test:
## Read 10000000 lines of DNA from a file, convert into binary and back again.
DNA = 'ATTCGACTTGACTG'
r = 0
while r != 10000000:
    r += 1
    #up2bit_string(string_up2bit(DNA))        # Takes 1min 12sec
    np_up2bit_string(np_string_up2bit(DNA))   # Takes 1min 45sec

正如您在右下方看到的,使用numpy uint32比python int版本花费的时间长45%。上面的代码中不应该有NumPy uint32s到python int的转换来解释速度减慢的原因,只是使用uint32s似乎比较慢。这意味着在真实世界的数据集上要花费数天的额外计算时间。你知道吗

有人知道怎么加快速度吗?也许有一种方法可以使python中的uint32数学成为默认值?也许我应该试试ctypes而不是numpy dtypes?你知道吗

经过编辑,任何人都可以通过手头的DNA数据来测试代码。


Tags: innumpyforstringreturnvaluedefdna

热门问题