<p>Python使用一个随机散列种子,通过向您发送设计用于冲突的密钥来防止攻击者攻击您的应用程序。请参阅<a href="http://www.ocert.org/advisories/ocert-2011-003.html" rel="noreferrer">original vulnerability disclosure</a>。通过使用随机种子(在启动时设置一次)抵消散列,攻击者无法再预测将碰撞哪些密钥。</p>
<p>可以通过设置<a href="https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED" rel="noreferrer">^{<cd1>} environment variable</a>设置固定种子或禁用该功能;默认值为<code>random</code>,但可以将其设置为固定正整数值,同时<code>0</code>完全禁用该功能。</p>
<p>Python版本2.7和3.2在默认情况下禁用了该特性(使用<code>-R</code>开关或设置<code>PYTHONHASHSEED=random</code>来启用它);在Python 3.3及更高版本中默认启用该特性。</p>
<p>如果您依赖于Python字典或集合中键的顺序,那么不要这样做。Python使用哈希表来实现这些类型及其顺序<a href="https://stackoverflow.com/questions/15479928/why-is-the-order-in-python-dictionaries-and-sets-arbitrary/15479974#15479974">depends on the insertion and deletion history</a>以及随机哈希种子。</p>
<p>另请参见<a href="https://docs.python.org/3/reference/datamodel.html#object.__hash__" rel="noreferrer">^{<cd6>} special method documentation</a>:</p>
<blockquote>
<p><strong>Note</strong>: By default, the <code>__hash__()</code> values of str, bytes and datetime objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.<br/>
This is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict insertion, O(n^2) complexity. See <a href="http://www.ocert.org/advisories/ocert-2011-003.html" rel="noreferrer">http://www.ocert.org/advisories/ocert-2011-003.html</a> for details.<br/>
Changing hash values affects the iteration order of dicts, sets and other mappings. Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds).<br/>
See also <code>PYTHONHASHSEED</code>.</p>
</blockquote>
<p>如果您需要一个稳定的散列实现,您可能需要查看<a href="https://docs.python.org/3/library/hashlib.html" rel="noreferrer">^{<cd7>} module</a>;它实现了加密散列函数。这是<a href="https://github.com/jaybaird/python-bloomfilter/blob/master/pybloom/pybloom.py#L54-L98" rel="noreferrer">pybloom project uses this approach</a>。</p>
<p>由于偏移量由前缀和后缀(分别是起始值和最终XORed值)组成,很遗憾,您不能只存储偏移量。另一方面,这也意味着攻击者也不能很容易地确定定时攻击的偏移量。</p>