使用字典将拼写错误的单词赋给其行numb

from collections import defaultdict goodwords = set() with open("soccer.txt", "rt") as f: for word in f.readlines(): goodwords.add(word.strip()) badwords = defaultdict(list) with open("soccer.txt", "rt") as f: for line_no, line in enumerate(f): for word in line.split(): if word not in text: badwords[word].append(line_no) print(badwords)

2条回答

网友

1楼 · 编辑于 2024-05-13 19:17:09

当您将新的counter插入d时，首先检查word是否包含在words中。可能您想检查word是否已经包含在d中：

if word not in d:
    d[word] = [counter]
else:
    d[word].append(counter)

检查word是否包含在words或{}中应该是一个单独的if。在

您还可以使用dictssetdefault()方法简化此逻辑：

^{pr2}$

或者将d变成defaultdict，这样可以更简化分配：

from collections import defaultdict
d = defaultdict(list)
...
d[word].append(counter)

关于一般的算法，请注意，现在您首先迭代所有行以增加计数器，然后，当计数器已经达到其最大值时，开始检查拼写错误的单词。也许您应该检查循环中递增计数器的每一行。在

网友

2楼 · 编辑于 2024-05-13 19:17:09

从你现在所做的来看，我认为以下几点非常适合你：

from collections import defaultdict

text = ( "cat", "dog", "rat", "bat", "rat", "dog",
         "man", "woman", "child", "child") #

d = defaultdict(list)

for lineno, word in enumerate(text):
    d[word].append(lineno)

print d

这将为您提供以下输出：

^{pr2}$

这只需设置一个空的默认字典，其中包含您访问的每个项目的列表，这样您就不必担心创建条目，然后在单词列表中枚举它，所以您不需要跟踪行号。在

由于您没有正确拼写的列表，因此它实际上不会检查单词是否拼写正确，只需构建一个字典，包含文本文件中的所有单词。在

要将词典转换为一组单词，请尝试：

all_words = set(d.keys())
print all_words

产生：

set(['bat', 'woman', 'dog', 'cat', 'rat', 'child', 'man'])

或者，只需打印以下文字：

for word in d.keys():
    print word

编辑3:

我认为这可能是最终版本：这是一个（故意的）非常粗糙，但几乎是完全的拼写检查。在

from collections import defaultdict

# Build a set of all the words we know, assuming they're one word per line
good_words = set() # Use a set, as this will have the fastest look-up time.
with open("words.txt", "rt") as f:
    for word in f.readlines():
        good_words.add(word.strip())

bad_words = defaultdict(list)

with open("text_to_check.txt", "rt") as f:
    # For every line of text, get the line number, and the text.
    for line_no, line in enumerate(f):
        # Split into seperate words - note there is an issue with punctuation,
        # case sensitivitey, etc..
        for word in line.split():
            # If the word is not recognised, record the line where it occurred.
            if word not in good_words:
                bad_words[word].append(line_no)

最后，bad_words将是一个字典，其中未识别的单词作为关键字，单词所在的行号作为匹配的值条目。在

相关问题更多 >

编程相关推荐

热门问题

热门文章