python 3中的快速异或字节

3条回答

网友

1楼 · 编辑于 2024-05-13 18:36:27

使用bytearray已经快了很多：

def bxor(b1, b2):
    result = bytearray(b1)
    for i, b in enumerate(b2):
        result[i] ^= b
    return bytes(result)

快速比较：

>>> import timeit
>>> b1, b2 = b'abcdefg' * 10, b'aaaaaaa' * 10
>>> timeit.timeit('it(b1, b2)', 'from __main__ import b1, b2, bxor as it', number=10000)
0.9230150280000089
>>> timeit.timeit('it(b1, b2)', 'from __main__ import b1, b2, bxor_ba as it', number=10000)
0.16270576599890774

这样可以避免为所有连接创建新的bytes对象。

b''.join()方法proposed by delnan并不比原始版本好多少：

>>> timeit.timeit('it(b1, b2)', 'from __main__ import b1, b2, bxor_join as it', number=10000)
0.9936718749995634

再运行100倍大的bytestrings：

>>> b1, b2 = b'abcdefg' * 1000, b'aaaaaaa' * 1000
>>> timeit.timeit('it(b1, b2)', 'from __main__ import b1, b2, bxor as it', number=1000)
11.032563796999966
>>> timeit.timeit('it(b1, b2)', 'from __main__ import b1, b2, bxor_join as it', number=1000)
9.242204494001271
>>> timeit.timeit('it(b1, b2)', 'from __main__ import b1, b2, bxor_ba as it', number=1000)
1.762020197998936

显示bytes.join()是比重复连接更快的连接。

最后一次700万字节的运行，重复了10次，对于bytearray版本，我对其他版本失去了耐心：

>>> b1, b2 = b'abcdefg' * 1000000, b'aaaaaaa' * 1000000
>>> timeit.timeit('it(b1, b2)', 'from __main__ import b1, b2, bxor_ba as it', number=10)
16.18445999799951

网友

2楼 · 编辑于 2024-05-13 18:36:27

当将每个元素都有一百万个的bytes对象进行异或时，此循环将创建大约一百万个临时bytes对象，并将每个字节从一个临时bytes复制到下一个临时bytes字节，平均大约50万次。注意，字符串也存在同样的问题（在许多其他语言中也是如此）。字符串解决方案是创建一个字符串部分列表，并在最后使用''.join有效地连接它们。您可以对字节执行相同的操作：

def bxor(b1, b2): # use xor for bytes
    parts = []
    for b1, b2 in zip(b1, b2):
        parts.append(bytes([b1 ^ b2]))
    return b''.join(parts)

或者，您可以使用bytearray，它是可变的，因此可以避免这个问题。它还允许您不在每次迭代中分配新的bytes对象，您只需附加字节/int。

def bxor(b1, b2): # use xor for bytes
    result = bytearray()
    for b1, b2 in zip(b1, b2):
        result.append(b1 ^ b2)
    return result

如果您想要/需要bytes对象，也可以return bytes(result)。

网友

3楼 · 编辑于 2024-05-13 18:36:27

在另一个答案中加上这个，因为它是一个：

如果你想要比给出的“手动”方法更快的东西，总是有Numpy：

import numpy

def bxor_numpy(b1, b2):
    n_b1 = numpy.fromstring(b1, dtype='uint8')
    n_b2 = numpy.fromstring(b2, dtype='uint8')

    return (n_b1 ^ n_b2).tostring()

而且速度很快：

first_random = urandom(100000)
second_random = urandom(100000)

min(Timer(partial(bxor_inplace, first_random, second_random)).repeat(10, 100))
#>>> 1.5381054869794752
min(Timer(partial(bxor_append, first_random, second_random)).repeat(10, 100))
#>>> 1.5624085619929247
min(Timer(partial(bxor_numpy, first_random, second_random)).repeat(10, 100))
#>>> 0.009930026979418471

所以它比这里公布的最佳替代方案快150倍。

相关问题更多 >

编程相关推荐

热门问题

热门文章