<p>从你现在所做的来看,我认为以下几点非常适合你:</p>
<pre><code>from collections import defaultdict
text = ( "cat", "dog", "rat", "bat", "rat", "dog",
"man", "woman", "child", "child") #
d = defaultdict(list)
for lineno, word in enumerate(text):
d[word].append(lineno)
print d
</code></pre>
<p>这将为您提供以下输出:</p>
^{pr2}$
<p>这只需设置一个空的默认字典,其中包含您访问的每个项目的列表,这样您就不必担心创建条目,然后在单词列表中枚举它,所以您不需要跟踪行号。在</p>
<p>由于您没有正确拼写的列表,因此它实际上不会检查单词是否拼写正确,只需构建一个字典,包含文本文件中的所有单词。在</p>
<p>要将词典转换为一组单词,请尝试:</p>
<pre><code>all_words = set(d.keys())
print all_words
</code></pre>
<p>产生:</p>
<pre><code>set(['bat', 'woman', 'dog', 'cat', 'rat', 'child', 'man'])
</code></pre>
<p>或者,只需打印以下文字:</p>
<pre><code>for word in d.keys():
print word
</code></pre>
<p><strong>编辑3:</strong></p>
<p>我认为这可能是最终版本:
这是一个(故意的)非常粗糙,但几乎是完全的拼写检查。在</p>
<pre><code>from collections import defaultdict
# Build a set of all the words we know, assuming they're one word per line
good_words = set() # Use a set, as this will have the fastest look-up time.
with open("words.txt", "rt") as f:
for word in f.readlines():
good_words.add(word.strip())
bad_words = defaultdict(list)
with open("text_to_check.txt", "rt") as f:
# For every line of text, get the line number, and the text.
for line_no, line in enumerate(f):
# Split into seperate words - note there is an issue with punctuation,
# case sensitivitey, etc..
for word in line.split():
# If the word is not recognised, record the line where it occurred.
if word not in good_words:
bad_words[word].append(line_no)
</code></pre>
<p>最后,<code>bad_words</code>将是一个字典,其中未识别的单词作为关键字,单词所在的行号作为匹配的值条目。在</p>