<p>通过每次调用<code>numpy</code>(随机块而不是单个值),并使用Python内置的<code>bytes</code>扫描简化模式搜索,可以大大加快速度:</p>
<pre><code>import numpy as np
NN=1000000
t11=np.zeros(NN)
t12=np.zeros(NN)
for i in range(NN):
block = b'\xff' # Prepop w/garbage byte so first byte never part of cnt
flag11 = flag12 = True
ctr = 1 # One lower to account for non-generated first byte
while flag11 or flag12:
# Generate 100 numbers at once, much faster than one at a time,
# store as bytes for reduced memory and cheap searches
# Keep last byte of previous block so a 1 at end matches 1/2 at beginning of next
block = block[-1:] + bytes(np.random.randint(1, 7, 100, np.uint8))
# Containment test scans faster in C than Python level one-at-a-time check
if flag11 and b'\x01\x01' in block:
t11[i] = ctr + block.index(b'\x01\x01')
flag11 = False
if flag12 and b'\x01\x02' in block:
t12[i] = ctr + block.index(b'\x01\x02')
flag12 = False
ctr += 100
print('Mean t11: %f' %(np.mean(t11)))
print('\nMean t12: %f' %(np.mean(t12)))
</code></pre>
<p>在我的(公认动力不足的机器)上,您的原始代码运行时间约为96秒;我的优化版本运行时间约为6.6秒,约为原始运行时间的7%。即使假设(平均而言)不需要生成超过一半的随机数据,当它避免更多的Python级工作循环并重试时,这样做仍然更快。你知道吗</p>
<p>再重写一点,您可以通过更改以下内容来避免<code>block</code>的双重扫描:</p>
<pre><code> if flag11 and b'\x01\x01' in block:
t11[i] = ctr + block.index(b'\x01\x01')
flag11 = False
</code></pre>
<p>更详细,但更有效:</p>
<pre><code> if flag11:
try:
t11[i] = ctr + block.index(b'\x01\x01')
except ValueError:
pass
else:
flag11 = False
</code></pre>
<p>(并对<code>flag12</code>测试进行等效更改)</p>
<p>由于生成的前100个字节通常都有命中率,这意味着用一个替换两个扫描,并将整个运行时间减少到~6.0秒。有更多极端的微优化可用(更多的是了解CPython的内部结构,而不是任何逻辑上的改进),可以让它在我的机器上降到5.4秒,但它们很难看,不值得花99.9%的时间。你知道吗</p>