加速模辊模拟

3条回答

网友

1楼 · 编辑于 2024-05-15 05:36:20

对于这个问题有一个很好的工具：有限状态机。你知道吗

但是Python并不是一种很好的语言，不适合快速实现。你知道吗

这里是一个非确定性状态机，它可以识别任何输入流中的两个序列。*表示从3到6的抛出：

NFSM

这很难实现，因为它一次可以占用多个状态，但有一种称为子集构造的标准算法，它将此转换为一个确定性状态机，实现起来非常有效。在这里应用它会产生：

DFSM

这是一个C实现。在Python中，可以使用一个映射，将当前状态的元组加上下一个状态的输入数字。在这里，我们使用goto在执行代码中根据位置实现映射：

#include <stdio.h>
#include <stdlib.h>

#define ROLL do { r = 1 + rand() %6; } while (0)

int main(void) {
  int n_trials = 10000000;
  int t_total_11 = 0;
  int t_total_12 = 0;

  for (int n = 0; n < n_trials; n++) {
    int r, t = -1, t_11 = 0, t_12 = 0;
    A:
      ++t;
      ROLL;
      if (r == 1) goto AB;
      goto A;
    AB:
      ++t;
      ROLL;
      if (r == 1) goto ABC;
      if (r == 2) goto AD;
      goto A;
    ABC:
      ++t;
      if (!t_11) {
        t_11 = t;
        t_total_11 += t_11;
        if (t_12) continue;
      }
      ROLL;
      if (r == 1) goto ABC;
      if (r == 2) goto AD;
      goto A;
    AD:
      ++t;
      if (!t_12) {
        t_12 = t;
        t_total_12 += t_12;
        if (t_11) continue;
      }
      ROLL;
      if (r == 1) goto AB;
      goto A;
  }
  printf("Avg for 11: %lf\n", (double) t_total_11 / n_trials);
  printf("Avg for 12: %lf\n", (double) t_total_12 / n_trials);
  return 0;
}

在我的旧Macbook上，它在5.3秒内迭代了1000万次（不是100万次）。所以这是一个很酷的~快100倍。当然，好的位元取决于PRNG的速度。Gnu的rand很快，但不是那么随机。显然，这足以说明趋同。程序打印：

Avg for 11: 41.986926
Avg for 12: 35.997196

当我有更多的时间，我会尝试一个Python impl。你知道吗

网友

2楼 · 编辑于 2024-05-15 05:36:20

通过每次调用numpy（随机块而不是单个值），并使用Python内置的bytes扫描简化模式搜索，可以大大加快速度：

import numpy as np

NN=1000000
t11=np.zeros(NN)
t12=np.zeros(NN)

for i in range(NN):
    block = b'\xff'  # Prepop w/garbage byte so first byte never part of cnt
    flag11 = flag12 = True
    ctr = 1  # One lower to account for non-generated first byte

    while flag11 or flag12:
        # Generate 100 numbers at once, much faster than one at a time,
        # store as bytes for reduced memory and cheap searches
        # Keep last byte of previous block so a 1 at end matches 1/2 at beginning of next
        block = block[-1:] + bytes(np.random.randint(1, 7, 100, np.uint8))

        # Containment test scans faster in C than Python level one-at-a-time check
        if flag11 and b'\x01\x01' in block:
            t11[i] = ctr + block.index(b'\x01\x01')
            flag11 = False
        if flag12 and b'\x01\x02' in block:
            t12[i] = ctr + block.index(b'\x01\x02')
            flag12 = False
        ctr += 100
print('Mean t11:  %f' %(np.mean(t11)))
print('\nMean t12: %f' %(np.mean(t12)))

在我的（公认动力不足的机器）上，您的原始代码运行时间约为96秒；我的优化版本运行时间约为6.6秒，约为原始运行时间的7%。即使假设（平均而言）不需要生成超过一半的随机数据，当它避免更多的Python级工作循环并重试时，这样做仍然更快。你知道吗

再重写一点，您可以通过更改以下内容来避免block的双重扫描：

        if flag11 and b'\x01\x01' in block:
            t11[i] = ctr + block.index(b'\x01\x01')
            flag11 = False

更详细，但更有效：

    if flag11:
        try:
            t11[i] = ctr + block.index(b'\x01\x01')
        except ValueError:
            pass
        else:
            flag11 = False

（并对flag12测试进行等效更改）

由于生成的前100个字节通常都有命中率，这意味着用一个替换两个扫描，并将整个运行时间减少到~6.0秒。有更多极端的微优化可用（更多的是了解CPython的内部结构，而不是任何逻辑上的改进），可以让它在我的机器上降到5.4秒，但它们很难看，不值得花99.9%的时间。你知道吗

网友

3楼 · 编辑于 2024-05-15 05:36:20

编译语言的程序性能明显优于解释语言。这就是为什么高频交易、视频游戏引擎和其他要求很高的软件是用c++等编译语言编程的。你知道吗

在优化方面，您可以尝试使用pythons编译特性，或者以本机方式而不是在IDE中运行程序。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章