对NumPy的SeedSequence内部机制的困惑
如果这有帮助的话,我是在一台运行Windows 11 Pro的64位桌面电脑上使用Python 3.11.5和NumPy 1.26.4。
为了更好地理解当我从某个给定的SeedSequence
请求np.random.Generator
对象时,NumPy在背后到底做了些什么,我决定用纯Python重现一下从给定的熵值初始化SeedSequence
时发生的事情。
根据我在这里找到的SeedSequence
的源代码,我理解了uint32
溢出是如何工作的,并且在我的机器上,np.dtype(np.uint32).itemsize
的值是4,也就是说XSHIFT
,定义为np.dtype(np.uint32).itemsize * 8 // 2
,是16,因此我写了以下代码:
seed = int(input('Please enter a seed: '))
Entropy = seed
Spawn_key = ()
Pool_size = 8
N_children_spawned = 0
Pool = [0 for _ in range(Pool_size)]
Assembled_entropy = []
Ent = Entropy + 0
while Ent > 0:
Assembled_entropy.append(Ent & 0xffffffff)
Ent >>= 32
if not Assembled_entropy:
Assembled_entropy = [0]
hash_const = 0x43b0d7e5
for i in range(Pool_size):
if i < len(Assembled_entropy):
Assembled_entropy[i] ^= hash_const
hash_const *= 0x931e8875
hash_const &= 0xffffffff
Assembled_entropy[i] *= hash_const
Assembled_entropy[i] &= 0xffffffff
Assembled_entropy[i] ^= Assembled_entropy[i] >> 16
Pool[i] = Assembled_entropy[i]
else:
value = hash_const
hash_const *= 0x931e8875
hash_const &= 0xffffffff
value *= hash_const
value &= 0xffffffff
value ^= value >> 16
Pool[i] = value
for i_src in range(Pool_size):
for i_dst in range(Pool_size):
if i_src != i_dst:
Pool[i_src] ^= hash_const
hash_const *= 0x931e8875
hash_const &= 0xffffffff
Pool[i_src] *= hash_const
Pool[i_src] &= 0xffffffff
Pool[i_src] ^= Pool[i_src] >> 16
x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
y = (0x4973f715 * Pool[i_src]) & 0xffffffff
Pool[i_dst] = x - y
Pool[i_dst] &= 0xffffffff
Pool[i_dst] ^= Pool[i_dst] >> 16
for i_src in range(Pool_size, len(Assembled_entropy)):
for i_dst in range(Pool_size):
Assembled_entropy[i_src] ^= hash_const
hash_const *= 0x931e8875
hash_const &= 0xffffffff
Assembled_entropy[i_src] *= hash_const
Assembled_entropy[i_src] &= 0xffffffff
Assembled_entropy[i_src] ^= Assembled_entropy[i_src] >> 16
x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
y = (0x4973f715 * Assembled_entropy[i_src]) & 0xffffffff
Pool[i_dst] = x - y
Pool[i_dst] &= 0xffffffff
Pool[i_dst] ^= Pool[i_dst] >> 16
print(Pool)
我在下面复制了一些测试运行的输出结果。
Please enter a seed: 0
[595626433, 3558985979, 200295889, 3864401631, 3155212474, 198111058, 4047350828, 373757291]
Please enter a seed: 1
[2396653877, 491222160, 2441066534, 3196981647, 1764919720, 3210735412, 1132315803, 1197535761]
Please enter a seed: 123456789
[2161290507, 266876805, 2694113549, 3306969538, 3218948428, 3543586554, 886289367, 3129292100]
Please enter a seed: 123456789123456789
[2628723507, 610487362, 209721652, 1960674985, 3519121735, 1259052354, 2097159984, 3934338599]
Please enter a seed: 123456789123456789123456789123456789
[2988668238, 798946769, 2484899198, 1005350017, 2633831484, 343737596, 1402961265, 3184558744]
Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789
[431881030, 3789410928, 218849910, 879851040, 1423068736, 85390627, 3721593143, 198649564]
Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789
[702225118, 2293461530, 514808704, 2115883586, 3179647446, 3197133803, 3807436730, 1822195906]
from numpy.random import SeedSequence
seed = int(input('Please enter a seed: '))
seedseq = SeedSequence(entropy=seed, spawn_key=[], pool_size=8, n_children_spawned=0)
print([int(value) for value in seedseq.pool])
然而,将这些相同的值提供给上面版本的程序,直接调用NumPy
的SeedSequence
,却得到了非常不同的结果:
Please enter a seed: 0
[2043904064, 467759482, 3940449851, 2747621207, 4006820188, 4161973813, 800317807, 2622167125]
Please enter a seed: 1
[476219752, 3923368624, 2653737542, 2876255837, 1861759290, 3300511046, 3253139541, 2224879358]
Please enter a seed: 123456789
[480462800, 1421661229, 2686834002, 3365909768, 3295673516, 1830753151, 1249963727, 3680881655]
Please enter a seed: 123456789123456789
[3112345096, 1618497203, 2864025213, 3262672577, 379697145, 163816190, 1265228116, 2568065655]
Please enter a seed: 123456789123456789123456789123456789
[2197723902, 2868273012, 1547285866, 2772382071, 2016971656, 1130152919, 897020445, 135618137]
Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789
[3230290517, 251217303, 1180998335, 454107561, 4150025399, 1840013050, 1216833737, 89665521]
Please enter a seed: 123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789
[902839167, 3446715647, 2106916613, 1578536987, 595141342, 3126308643, 400300642, 3659109886]
这是怎么回事呢?
更新:根据@OskarHoffman的回答,我已经修正了我的代码。这里附上,以防有人感兴趣。
seed = int(input('Please enter a seed: '))
Entropy = seed
Spawn_key = ()
Pool_size = 8
N_children_spawned = 0
Pool = [0 for _ in range(Pool_size)]
Assembled_entropy = []
Ent = Entropy + 0
while Ent > 0:
Assembled_entropy.append(Ent & 0xffffffff)
Ent >>= 32
if not Assembled_entropy:
Assembled_entropy = [0]
hash_const = 0x43b0d7e5
for i in range(Pool_size):
if i < len(Assembled_entropy):
temp = Assembled_entropy[i] ^ hash_const
hash_const *= 0x931e8875
hash_const &= 0xffffffff
temp *= hash_const
temp &= 0xffffffff
temp ^= temp >> 16
Pool[i] = temp
else:
value = hash_const
hash_const *= 0x931e8875
hash_const &= 0xffffffff
value *= hash_const
value &= 0xffffffff
value ^= value >> 16
Pool[i] = value
for i_src in range(Pool_size):
for i_dst in range(Pool_size):
if i_src != i_dst:
temp = Pool[i_src] ^ hash_const
hash_const *= 0x931e8875
hash_const &= 0xffffffff
temp *= hash_const
temp &= 0xffffffff
temp ^= temp >> 16
x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
y = (0x4973f715 * temp) & 0xffffffff
Pool[i_dst] = x - y
Pool[i_dst] &= 0xffffffff
Pool[i_dst] ^= Pool[i_dst] >> 16
for i_src in range(Pool_size, len(Assembled_entropy)):
for i_dst in range(Pool_size):
temp = Assembled_entropy[i_src] ^ hash_const
hash_const *= 0x931e8875
hash_const &= 0xffffffff
temp *= hash_const
temp &= 0xffffffff
temp ^= temp >> 16
x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
y = (0x4973f715 * temp) & 0xffffffff
Pool[i_dst] = x - y
Pool[i_dst] &= 0xffffffff
Pool[i_dst] ^= Pool[i_dst] >> 16
print(Pool)
1 个回答
1
你们的第二个for循环在实现hashmix()
函数时有区别。你在i_src
这个位置上修改了Pool
列表的值来计算y
。而numpy的实现则没有这样做。它只是复制了Pool[i_src]
的值(通过把它作为参数传给hashmix
函数),然后只修改这个复制的值(最后会丢弃这个复制的值)。
所以把那个for循环改成:
for i_src in range(Pool_size):
for i_dst in range(Pool_size):
if i_src != i_dst:
# work with new variable instead of modifying Pool[i_src]
temp = Pool[i_src] ^ hash_const
hash_const *= 0x931e8875
hash_const &= 0xffffffff
temp *= hash_const
temp &= 0xffffffff
temp ^= temp >> 16
x = (0xca01f9dd * Pool[i_dst]) & 0xffffffff
y = (0x4973f715 * temp) & 0xffffffff
Pool[i_dst] = x - y
Pool[i_dst] &= 0xffffffff
Pool[i_dst] ^= Pool[i_dst] >> 16
我得到了和numpy实现一样的结果。