如何从列表中获取最常用的“n”个单词?

2024-04-27 05:15:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两张单子。每个列表都包括单词。有些单词在两个列表中都是通用的,有些则不是。我只想输出20个最常用的单词,但我的代码显示了所有常用单词。我想把范围限制在20。我不允许使用柜台

def countwords(lst):
    dct = {}
    for word in lst:
        dct[word] = dct.get(word, 0) + 1
    return dct


count1 = countwords(finallist1)
count2 = countwords(finallist2)

words1 = set(count1.keys())
words2 = set(count2.keys())

common_words = words1.intersection(words2)
for i,w in enumerate (common_words,1):
    print(f"{i}\t{w}\t{count1[w]}\t{count2[w]}\t{count1[w] + count2[w]}")

预期产出:

common   f1 f2 sum 
1 program 5 10 15 
2 python  2  4  6 
.
.
until 20

Tags: in列表forcommonkeys单词worddct
1条回答
网友
1楼 · 发布于 2024-04-27 05:15:57

您可以使用^{}中的^{}来实现这一点:

>>> from collections import Counter
>>> word_list = ["one", "two", "three", "four", "two", "three", "four", "three", "four", "four"]

>>> Counter(word_list).most_common(2)
[('four', 4), ('three', 3)]

^{} documentation开始:

Return a list of the "n" most common elements and their counts from the most common to the least. If "n" is omitted or None, most_common() returns all elements in the counter. Elements with equal counts are ordered in the order first encountered


这里有一个备选方案,可以在不导入任何模块的情况下实现相同的

# Step 1: Create Counter dictionary holding frequency. 
#         Similar to: `collections.Counter()` 
my_counter = {}
for word in word_list:
    my_counter[word] = my_counter.get(word, 0) + 1

# where `my_counter` will hold:
# {'four': 4, 'three': 3, 'two': 2, 'one': 1}
#      -

# Step 2: Get sorted list holding word & frequency in descending order.
#         Similar to: `Counter.most_common()`
sorted_frequency = sorted(my_counter.iteritems(), key=lambda x: x[1], reverse=True)

# where `sorted_frequency` will hold:
# [('four', 4), ('three', 3), ('two', 2), ('one', 1)]
#      -

# Step 3: Get top two words by slicing the ordered list from Step 2.
#         Similar to: `.most_common(2)`
top_two = sorted_frequency[:2]

# where `top_two` will hold:
# [('four', 4), ('three', 3)]

请参考上述代码片段中的注释,以了解分步说明

相关问题 更多 >