使用Python进行词频排序统计

46 投票

12 回答

97822 浏览

数据工程师

提问于 2025-04-16 06:29

我需要用Python来统计一段文本中每个单词出现的频率。我想把单词放在一个字典里，每个单词都有一个对应的计数。

现在如果我想按照单词出现的次数来排序，能不能直接用这个字典，而不是再创建一个新的字典，让计数作为键，单词数组作为值呢？

数据结构字典排序文本分析词频统计

12 个回答

我刚刚写了一个类似的程序，得到了Stack Overflow上朋友们的帮助：

from string import punctuation
from operator import itemgetter

N = 100
words = {}

words_gen = (word.strip(punctuation).lower() for line in open("poi_run.txt")
                                             for word in line.split())

for word in words_gen:
    words[word] = words.get(word, 0) + 1

top_words = sorted(words.items(), key=itemgetter(1), reverse=True)[:N]

for word, frequency in top_words:
    print ("%s %d" % (word, frequency))

回答于 2025-04-16 由 Python大师

分享举报

警告：这个例子需要使用Python 2.7或更高版本。

Python自带的Counter对象正是你需要的。数单词甚至是文档中的第一个例子：

>>> # Tally occurrences of words in a list
>>> from collections import Counter
>>> cnt = Counter()
>>> for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
...     cnt[word] += 1
>>> cnt
Counter({'blue': 3, 'red': 2, 'green': 1})

正如评论中所说，Counter需要一个可迭代的对象，所以上面的例子只是为了说明，实际上可以写成：

>>> mywords = ['red', 'blue', 'red', 'green', 'blue', 'blue']
>>> cnt = Counter(mywords)
>>> cnt
Counter({'blue': 3, 'red': 2, 'green': 1})

回答于 2025-04-16 由 Python大师

分享举报

你可以使用同一个字典：

>>> d = { "foo": 4, "bar": 2, "quux": 3 }
>>> sorted(d.items(), key=lambda item: item[1])

第二行会打印出：

[('bar', 2), ('quux', 3), ('foo', 4)]

如果你只想要一个排序好的单词列表，可以这样做：

>>> [pair[0] for pair in sorted(d.items(), key=lambda item: item[1])]

那一行会打印出：

['bar', 'quux', 'foo']

回答于 2025-04-16 由 Python大师

分享举报

使用Python进行词频排序统计

12 个回答

撰写回答