Python 词频统计与排名
我正在Windows环境下用Python 3.2开发一个单词出现次数统计的应用。
有没有人能帮我看看为什么下面的代码不工作呢?
from string import punctuation
from operator import itemgetter
N = 100
words = {}
words_gen = (word.strip(punctuation).lower() for line in open("poi_run.txt")
for word in line.split())
for word in words_gen:
words[word] = words.get(word, 0) + 1
top_words = (words.iteritems(), key=itemgetter(1), reverse=True)[:N]
for word, frequency in top_words:
print ("%s %d") % (word, frequency)
错误信息是:
Message File Name Line Position
Traceback
<module> C:\Users\will\Desktop\word_count.py 13
AttributeError: 'dict' object has no attribute 'iteritems'
谢谢大家
补充说明:
完整的可运行代码:
from string import punctuation
from operator import itemgetter
N = 100
words = {}
words_gen = (word.strip(punctuation).lower() for line in open("poi_run.txt")
for word in line.split())
for word in words_gen:
words[word] = words.get(word, 0) + 1
top_words = sorted(words.items(), key=itemgetter(1), reverse=True)[:N]
for word, frequency in top_words:
print ("%s %d" % (word, frequency))
再次感谢大家
3 个回答
2
来自Python 3.x的实现文档
“另外,dict.iterkeys()、dict.iteritems()和dict.itervalues()这些方法不再被支持。”
想要了解Python 3.x的正确用法,可以查看上面的链接。
最简单的方法是使用map()或filter()来获取字典的键。
4
想想看,Counter
这个类来自于 collections
模块,它可以帮你自动完成第一个 for
循环的工作:
from collections import Counter
N = 100
words_gen = ...
top_words = Counter(words_gen).most_common(N)
for word, frequency in top_words:
print("%s %d" % (word, frequency))