python中的快速、大宽度、非加密字符串散列

网友

1楼 · 编辑于 2024-05-12 17:51:54

“strings”：我假设您希望散列Python 2.xstr对象和/或Python3.xbytes和/或bytearray对象。

这可能违反了您的第一个约束，但是：请考虑使用

(zlib.adler32(strg, perturber) << N) ^ hash(strg)

获取（32+N）位哈希。

网友

2楼 · 编辑于 2024-05-12 17:51:54

Use the built-in hash() function. This function, at least on the machine I'm developing for (with python 2.7, and a 64-bit cpu) produces an integer that fits within 32 bits - not large enough for my purposes.

那不是真的。内置哈希函数将在64位系统上生成64位哈希。

这是来自Objects/stringobject.c（python版本2.7）的python str哈希函数：

static long
string_hash(PyStringObject *a)
{
    register Py_ssize_t len;
    register unsigned char *p;
    register long x;      /* Notice the 64-bit hash, at least on a 64-bit system */

    if (a->ob_shash != -1)
    return a->ob_shash;
    len = Py_SIZE(a);
    p = (unsigned char *) a->ob_sval;
    x = *p << 7;
    while (--len >= 0)
        x = (1000003*x) ^ *p++;
    x ^= Py_SIZE(a);
    if (x == -1)
        x = -2;
    a->ob_shash = x;
    return x;
}

网友

3楼 · 编辑于 2024-05-12 17:51:54

看看128-bit variant of MurmurHash3。algorithm's page包含一些性能数字。应该可以将它移植到Python，pure或作为C扩展。（已更新作者建议使用128位变量，并丢弃不需要的位）。

如果murrushash2 64位对您有效，那么pyfasthash package中有一个Python实现（C扩展），其中包括一些其他的非加密散列变量，尽管其中一些仅提供32位输出。

更新我为murdur3散列函数做了一个快速的Python包装器。{a4}，你可以在Python Package Index as well上找到它，它只需要一个C++编译器来编译；不需要任何提升。

使用示例和计时比较：

import murmur3
import timeit

# without seed
print murmur3.murmur3_x86_64('samplebias')
# with seed value
print murmur3.murmur3_x86_64('samplebias', 123)

# timing comparison with str __hash__
t = timeit.Timer("murmur3.murmur3_x86_64('hello')", "import murmur3")
print 'murmur3:', t.timeit()

t = timeit.Timer("str.__hash__('hello')")
print 'str.__hash__:', t.timeit()

输出：

15662901497824584782
7997834649920664675
murmur3: 0.264422178268
str.__hash__: 0.219163894653

相关问题更多 >

编程相关推荐

热门问题

热门文章