我一直在研究生成随机文本的代码:
from collections import defaultdict, Counter
from itertools import ifilter
from random import choice, randrange
def pairwise(iterable):
it = iter(iterable)
last = next(it)
for curr in it:
yield last, curr
last = curr
valid = set('abcdefghijklmnopqrstuvwxyz ')
def valid_pair((last, curr)):
return last in valid and curr in valid
def make_markov(text):
markov = defaultdict(Counter)
lowercased = (c.lower() for c in text)
for p, q in ifilter(valid_pair, pairwise(lowercased)):
markov[p][q] += 1
return markov
def genrandom(model, n):
curr = choice(list(model))
for i in xrange(n):
yield curr
if curr not in model: # handle case where there is no known successor
curr = choice(list(model))
d = model[curr]
target = randrange(sum(d.values()))
cumulative = 0
for curr, cnt in d.items():
cumulative += cnt
if cumulative > target:
break
model = make_markov('The qui_.ck brown fox')
print ''.join(genrandom(model, 20))
但是,从target=randrange(sum(d.values()))开始,我很难理解最后一位。 如能解释,将不胜感激!谢谢!你知道吗
target = randrange(sum(d.values()))
d.values()
由于model是将字母映射到counter对象的字典,counter对象是字典,d.values()
是字典中每个键的所有计数的列表(但不返回键)。这意味着sum(d.values())
将返回所有计数的总和。randrange()
在[0,result]中选择一个值,其中result是sum(d.values())
的值。你知道吗d.items()
返回给定计数字典中每个项的键、值对。代码试图为每个字母分配一个概率,然后选择一个字母。如果计数为('a',5),('b',7)和('c',2),则计数总数为14。代码选择0到13(含)之间的随机数。如果结果在[0,5]中,它将返回'a',如果结果在[5,12]中,它将返回'b',如果结果在[12,14]中,它将返回'c'。相对概率由这些范围的宽度决定,范围的宽度由make_markov
中确定的计数决定。你知道吗相关问题 更多 >
编程相关推荐