Python中使用.itemgetter对列表值排序的奇怪输出

Question

我正在参加谷歌的Python编程课程，现在在做一个叫Word_Count.py的练习。这个练习的目标是创建一个字典，字典的键是单词，值是单词的出现次数，并把它们以元组的形式返回，以便打印出来。

我已经写了一个辅助函数来创建这个字典：

def dict_creator(filename): #helper function to create a dictionary each 'word' is a key and the 'wordcount' is the value
            input_file = open(filename, 'r') #open file as read
            for line in input_file: #for each line of text in the input file
                    words = line.split() #split each line into individual words
                    for word in words: #for each word in the words list(?)
                            word = word.lower() #make each word lower case.
                            if word not in word_count: #if the word hasn't been seen before
                                    word_count[word] = 1 #create a dictionary key with the 'word' and assign a value of 1
                            else: word_count[word] += 1 #if 'word' seen before, increase value by 1
            return word_count #return word_count dictionary
            word_count.close()

现在我正在使用这个帖子中提到的.itemgetter方法，来创建一个按值（出现次数）从大到小排序的字典：链接。这是我的代码：

def print_words(filename):
        word_count = dict_creator(filename) #run dict_creator on input file (creating dictionary)
        print sorted(word_count.iteritems(), key=operator.itemgetter(1), reverse=True)
        #print dictionary in total sorted descending by value. Values have been doubled compared to original dictionary?
        for word in sorted(word_count.iteritems(), key=operator.itemgetter(1), reverse=True):
                #create sorted list of tuples using operator module functions sorted in an inverse manner
                a = word
                b = word_count[word]
                print a, b #print key and value

但是，当我在测试文件和一个较小的文件上运行代码时，出现了一个键错误（如下所示）。

Traceback (most recent call last):
  File "F:\Misc\google-python-exercises\basic\wordcount_edited.py", line 74, in <module>
    print_words(lorem_ipsum) #run input file through print_words
  File "F:\Misc\google-python-exercises\basic\wordcount_edited.py", line 70, in print_words
    b = word_count[word]
KeyError: ('in', 3)

我打印了原始字典和排序后的字典，发现排序后字典中的所有值都翻倍了。我查看了几个相关的讨论串，并检查了.itemgetter的文档，但似乎没有找到其他人有类似的问题。

有没有人能指出是什么原因导致我的代码在word_count函数中第二次遍历字典，从而导致值增加？

谢谢！

SB

数据结构元组字典排序 itemgetter 词频统计键错误编程课程

Python中使用.itemgetter对列表值排序的奇怪输出

1 个回答

撰写回答