回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我想修改下面的脚本,这样它就可以根据脚本生成的句子随机数创建段落。换言之,在添加新行之前,连接一个随机数目的句子(如1-5)。在</p>
<p>脚本工作正常,但输出的是用换行符分隔的短句。我想把一些句子整理成段落。在</p>
<p>对最佳实践有什么想法吗?谢谢。在</p>
<pre><code>"""
from: http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python
"""
import random;
import sys;
stopword = "\n" # Since we split on whitespace, this can never be a word
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word
sentencesep = "\n" #String used to seperate sentences
# GENERATE TABLE
w1 = stopword
w2 = stopword
table = {}
for line in sys.stdin:
for word in line.split():
if word[-1] in stopsentence:
table.setdefault( (w1, w2), [] ).<a href="https://www.cnpython.com/list/append" class="inner-link">append</a>(word[0:-1])
w1, w2 = w2, word[0:-1]
word = word[-1]
table.setdefault( (w1, w2), [] ).append(word)
w1, w2 = w2, word
# Mark the end of the file
table.setdefault( (w1, w2), [] ).append(stopword)
# GENERATE SENTENCE OUTPUT
maxsentences = 20
w1 = stopword
w2 = stopword
sentencecount = 0
sentence = []
while sentencecount < maxsentences:
newword = random.choice(table[(w1, w2)])
if newword == stopword: sys.exit()
if newword in stopsentence:
print ("%s%s%s" % (" ".join(sentence), newword, sentencesep))
sentence = []
sentencecount += 1
else:
sentence.append(newword)
w1, w2 = w2, newword
</code></pre>
<hr/>
<p><strong>编辑01:</strong></p>
<p>好吧,我已经拼凑了一个简单的“段落包装器”,它可以很好地将句子集合成段落,但是它扰乱了句子生成器的输出——例如,在其他问题中,我得到了第一个单词的过度重复。在</p>
<p>但是前提是合理的;我只需要弄清楚为什么句子循环的功能会受到段落循环的影响。如果您能看到问题,请告知:</p>
^{pr2}$
<hr/>
<p><strong>编辑02:</strong></p>
<p>根据下面的答案将<code>sentence = []</code>添加到<code>elif</code>语句中。也就是说</p>
<pre><code> elif newword in stopsentence:
print ("%s%s" % (" ".join(sentence), newword), end=" ")
sentence = [] # I have to be here to make the new sentence start as an empty list!!!
sentencecount += 1 # increment the sentence counter
</code></pre>
<hr/>
<p><strong>编辑03:</strong></p>
<p>这是这个脚本的最后一次迭代。感谢格里夫帮我解决这个问题。我希望其他人能从中得到一些乐趣,我知道我会的。;)</p>
<p>仅供参考:有一个小的工件-有一个额外的段落末尾空间,如果您使用这个脚本,您可能需要清理。但是,除此之外,马尔可夫链文本生成的完美实现。在</p>
<pre><code>###
# usage: python markov_sentences.py < input.txt > output.txt
# from: http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python
###
import random;
import sys;
stopword = "\n" # Since we split on whitespace, this can never be a word
stopsentence = (".", "!", "?",) # Cause a "new sentence" if found at the end of a word
sentencesep = "\n" #String used to seperate sentences
# GENERATE TABLE
w1 = stopword
w2 = stopword
table = {}
for line in sys.stdin:
for word in line.split():
if word[-1] in stopsentence:
table.setdefault( (w1, w2), [] ).append(word[0:-1])
w1, w2 = w2, word[0:-1]
word = word[-1]
table.setdefault( (w1, w2), [] ).append(word)
w1, w2 = w2, word
# Mark the end of the file
table.setdefault( (w1, w2), [] ).append(stopword)
# GENERATE SENTENCE OUTPUT
maxsentences = 20
w1 = stopword
w2 = stopword
sentencecount = 0
sentence = []
paragraphsep = "\n"
count = random.randrange(1,5)
while sentencecount < maxsentences:
newword = random.choice(table[(w1, w2)]) # random word from word table
if newword == stopword: sys.exit()
if newword in stopsentence:
print ("%s%s" % (" ".join(sentence), newword), end=" ")
sentence = []
sentencecount += 1 # increment the sentence counter
count -= 1
if count == 0:
count = random.randrange(1,5)
print (paragraphsep) # newline space
else:
sentence.append(newword)
w1, w2 = w2, newword
# EOF
</code></pre>