Python是随机的吗?兰丁是随机的吗?

2024-04-25 22:54:37 发布

您现在位置:Python中文网/ 问答频道 /正文

所以我在测试一个游戏中骰子掷骰概率的计算方法。 基本情况下,如果滚动10边模具。

我做了一百万个样本,结果得到了如下比例:

Result
0       0.000000000000000%
1       10.038789961210000%
2       10.043589956410000%
3       9.994890005110000%
4       10.025289974710000%
5       9.948090051909950%
6       9.965590034409970%
7       9.990190009809990%
8       9.985490014509990%
9       9.980390019609980%
10      10.027589972410000%

这些当然都应该是10%。 这些结果的标准偏差为0.0323207%。 对我来说,这似乎很高。 只是巧合吗? 据我所知,随机模块访问适当的伪随机数。 即那些通过统计检验的方法是随机的。 或者这些是伪随机数生成器

我应该使用加密伪随机数生成器吗? 我很确定我不需要一个true随机数生成器(参见http://www.random.org/http://en.wikipedia.org/wiki/Hardware_random_number_generator)。

我正在用10亿个样本重新生成我所有的结果, (因为为什么不呢,我有一个松脆的服务器在我的支配下,还有一些睡眠要做)


Tags: 模块方法orghttp游戏情况randomresult
3条回答

Martijn的答案是对Python可以访问的随机数生成器的一个非常简洁的回顾。

如果要检查生成的伪随机数据的属性,请从http://www.fourmilab.ch/random/下载random.zip,并在大样本随机数据上运行它。特别是卡方检验对随机性非常敏感。对于一个真正随机的序列,来自x2检验的百分比应该在10%到90%之间。

对于一个游戏,我猜Python内部使用的Mersenne Twister应该足够随机(除非您正在构建一个在线赌场:-)。

如果您想要纯随机性,并且您使用的是Linux,那么可以从/dev/random中读取。这只会从内核的熵池(从不可预测的中断到达时间收集)生成随机数据,因此如果耗尽它,它将阻塞。此熵用于初始化(种子)由/dev/urandom使用的PRNG。在FreeBSD上,为/dev/random提供数据的PRNG使用Yarrow算法,该算法通常被认为是加密安全的。

编辑:我对来自random.randint的字节运行了一些测试。首先创建一百万个随机字节:

import random
ba = bytearray([random.randint(0,255) for n in xrange(1000000)])
with open('randint.dat', 'w+') as f:
    f.write(ba)

然后我从Fourmilab上运行ent程序:

Entropy = 7.999840 bits per byte.

Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.

Chi square distribution for 1000000 samples is 221.87, and randomly
would exceed this value 93.40 percent of the times.

Arithmetic mean value of data bytes is 127.5136 (127.5 = random).
Monte Carlo value for Pi is 3.139644559 (error 0.06 percent).
Serial correlation coefficient is -0.000931 (totally uncorrelated = 0.0).

现在对于x2检验,你从50%得到的越远,数据就越可疑。如果非常挑剔,则认为10%或90%的值是不可接受的。John Walker是ent一书的作者,他称这个值“几乎是可疑的”。

作为对比,这里是我之前运行的对FreeBSD的Yarrow prng中的10 MiB的相同分析:

Entropy = 7.999982 bits per byte.

Optimum compression would reduce the size
of this 10485760 byte file by 0 percent.

Chi square distribution for 10485760 samples is 259.03, and randomly
would exceed this value 41.80 percent of the times.

Arithmetic mean value of data bytes is 127.5116 (127.5 = random).
Monte Carlo value for Pi is 3.139877754 (error 0.05 percent).
Serial correlation coefficient is -0.000296 (totally uncorrelated = 0.0).

其他数据差异不大,但χ2检验结果远接近50%。

我用10亿次迭代重播了OP的练习:

from collections import Counter
import random
n = 1000000000
c = Counter(random.randint(1, 10) for _ in xrange(n))
for i in range(1,11):
    print '%2s  %02.10f%%' % (i, c[i] * 100.0 / n)

下面是(重新格式化的)结果:

 1     9.9996500000%
 2    10.0011089000%
 3    10.0008568000%
 4    10.0007495000%
 5     9.9999089000%
 6     9.9985344000%
 7     9.9994913000%
 8     9.9997877000%
 9    10.0010818000%
10     9.9988307000%

请参阅此问题的其他答案,以了解其出色的分析。

^{} module documentation

Almost all module functions depend on the basic function random(), which generates a random float uniformly in the semi-open range [0.0, 1.0). Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1. The underlying implementation in C is both fast and threadsafe. The Mersenne Twister is one of the most extensively tested random number generators in existence. However, being completely deterministic, it is not suitable for all purposes, and is completely unsuitable for cryptographic purposes.

Wikipedia article on the Mersenne Twister

It provides for fast generation of very high-quality pseudorandom numbers, having been designed specifically to rectify many of the flaws found in older algorithms.

如果您有一个特定于操作系统的随机性源,可以通过^{}获得,那么您可以使用^{}类代替。大多数random模块函数都可用作该类的方法。它可能更适合于加密目的,再次引用文档:

The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation.

Python 3.6添加了一个^{} module和方便的方法来生成适合加密目的的随机数据:

The secrets module is used for generating cryptographically strong random numbers suitable for managing data such as passwords, account authentication, security tokens, and related secrets.

In particularly, secrets should be used in preference to the default pseudo-random number generator in the random module, which is designed for modelling and simulation, not security or cryptography.

相关问题 更多 >