python 如何像zlib.adler32一样快速计算简单校验和

5 投票

2 回答

9133 浏览

提问于 2025-04-17 14:17

我想计算一个简单的校验和：就是把所有字节的值加起来。

我找到的最快的方法是：

checksum = sum([ord(c) for c in buf])

但是对于13兆字节的数据缓冲区，这个方法花了4.4秒：太慢了（在C语言中，这个只需要0.5秒）。

如果我使用：

checksum = zlib.adler32(buf) & 0xffffffff

这个方法花了0.8秒，但结果却不是我想要的。

所以我的问题是：有没有什么函数、库或者可以在Python 2.6中使用的C代码，来计算一个简单的校验和？

提前谢谢你，
埃里克。

性能优化数据处理 c语言数据缓冲区算法效率校验和字节操作

2 个回答

使用 numpy.frombuffer(buf, "uint8").sum()，这个方法似乎比你之前的例子快大约70倍：

In [9]: import numpy as np

In [10]: buf = b'a'*(13*(1<<20))

In [11]: sum(bytearray(buf))
Out[11]: 1322254336

In [12]: %timeit sum(bytearray(buf))
1 loops, best of 3: 253 ms per loop

In [13]: np.frombuffer(buf, "uint8").sum()
Out[13]: 1322254336

In [14]: %timeit np.frombuffer(buf, "uint8").sum()
10 loops, best of 3: 36.7 ms per loop

In [15]: %timeit sum([ord(c) for c in buf])
1 loops, best of 3: 2.65 s per loop

回答于 2025-04-17 由 Python大师

分享举报

你可以使用 sum(bytearray(buf)) 来计算总和：

In [1]: buf = b'a'*(13*(1<<20))

In [2]: %timeit sum(ord(c) for c in buf)
1 loops, best of 3: 1.25 s per loop

In [3]: %timeit sum(imap(ord, buf))
1 loops, best of 3: 564 ms per loop

In [4]: %timeit b=bytearray(buf); sum(b)
10 loops, best of 3: 101 ms per loop

这里有一个用 Cython 写的 Python 扩展，文件名是 sumbytes.pyx：

from libc.limits cimport ULLONG_MAX, UCHAR_MAX

def sumbytes(bytes buf not None):
    cdef:
        unsigned long long total = 0
        unsigned char c
    if len(buf) > (ULLONG_MAX // <size_t>UCHAR_MAX):
        raise NotImplementedError #todo: implement for > 8 PiB available memory
    for c in buf:
        total += c
    return total

sumbytes 的速度大约是 bytearray 版本的 10 倍：

name                    time ratio
sumbytes_sumbytes    12 msec  1.00 
sumbytes_numpy     29.6 msec  2.48 
sumbytes_bytearray  122 msec 10.19

如果你想重复这个时间测量的实验，可以下载 reporttime.py 并运行：

#!/usr/bin/env python
# compile on-the-fly
import pyximport; pyximport.install() # pip install cython
import numpy as np 
from reporttime import get_functions_with_prefix, measure    
from sumbytes import sumbytes # from sumbytes.pyx

def sumbytes_sumbytes(input):
    return sumbytes(input)

def sumbytes_bytearray(input):
    return sum(bytearray(input))

def sumbytes_numpy(input):
    return np.frombuffer(input, 'uint8').sum() # @root's answer

def main():
    funcs = get_functions_with_prefix('sumbytes_')
    buf = ''.join(map(unichr, range(256))).encode('latin1') * (1 << 16)
    measure(funcs, args=[buf])

main()

回答于 2025-04-17 由 Python大师

分享举报

python 如何像zlib.adler32一样快速计算简单校验和

2 个回答

撰写回答