加速模辊模拟问题的回答

加速模辊模拟

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

通过每次调用<code>numpy</code>（随机块而不是单个值），并使用Python内置的<code>bytes</code>扫描简化模式搜索，可以大大加快速度： <pre><code>import numpy as np NN=1000000 t11=np.zeros(NN) t12=np.zeros(NN) for i in range(NN): block = b'\xff' # Prepop w/garbage byte so first byte never part of cnt flag11 = flag12 = True ctr = 1 # One lower to account for non-generated first byte while flag11 or flag12: # Generate 100 numbers at once, much faster than one at a time, # store as bytes for reduced memory and cheap searches # Keep last byte of previous block so a 1 at end matches 1/2 at beginning of next block = block[-1:] + bytes(np.random.randint(1, 7, 100, np.uint8)) # Containment test scans faster in C than Python level one-at-a-time check if flag11 and b'\x01\x01' in block: t11[i] = ctr + block.index(b'\x01\x01') flag11 = False if flag12 and b'\x01\x02' in block: t12[i] = ctr + block.index(b'\x01\x02') flag12 = False ctr += 100 print('Mean t11: %f' %(np.mean(t11))) print('\nMean t12: %f' %(np.mean(t12))) </code></pre> 在我的（公认动力不足的机器）上，您的原始代码运行时间约为96秒；我的优化版本运行时间约为6.6秒，约为原始运行时间的7%。即使假设（平均而言）不需要生成超过一半的随机数据，当它避免更多的Python级工作循环并重试时，这样做仍然更快。你知道吗 再重写一点，您可以通过更改以下内容来避免<code>block</code>的双重扫描： <pre><code> if flag11 and b'\x01\x01' in block: t11[i] = ctr + block.index(b'\x01\x01') flag11 = False </code></pre> 更详细，但更有效： <pre><code> if flag11: try: t11[i] = ctr + block.index(b'\x01\x01') except ValueError: pass else: flag11 = False </code></pre> （并对<code>flag12</code>测试进行等效更改） 由于生成的前100个字节通常都有命中率，这意味着用一个替换两个扫描，并将整个运行时间减少到~6.0秒。有更多极端的微优化可用（更多的是了解CPython的内部结构，而不是任何逻辑上的改进），可以让它在我的机器上降到5.4秒，但它们很难看，不值得花99.9%的时间。你知道吗

加速模辊模拟

1 个回答

相关Python问题