返回最频繁单词的函数

from collections import Counter def words_counter(sample_txt): return Counter(sample_txt.lower().split()).most_common(8) words_counter(txt) [('a', 9), ('to', 8), ('python', 4), ('is', 4), ('the', 4), ('it', 3), ('code', 3), ('they', 3), ('if', 3), ('there’s', 3)]

2条回答

网友

1楼 · 编辑于 2024-04-19 07:10:47

一种方法是首先创建一个新的单词列表，删除所有与这两个条件不匹配的单词，然后从该列表中构建Counter

顺便说一下，您的代码忽略了跳过点和逗号的要求

import re
from collections import Counter

def common_words(text, stop_words, min_length):
    # requirement 1: get rid of dot and comma
    clean_text = re.sub(r'[.,]', '', text)

    # requirement 2: create list of words that are longer than min_length characters and not in stop_words
    words = [ word for word in clean_text.lower().split() if len(word) > min_length and word not in stop_words ]

    # now count like you did before:
    return Counter(words).most_common(8)

网友

2楼 · 编辑于 2024-04-19 07:10:47

您可以尝试使用一个generator和walrus operator，如下所示（对于python 3.8+）：

def words_counter(sample_txt):
    filtered_words = (cleaned for word in sample_txt.split() 
        if (cleaned := word.strip('.,').lower()) not in stop_words
        and len(cleaned) > 4)
    return Counter(filtered_words).most_common(8)

>>> words_counter(txt)
[('python', 4),
 ('there’s', 3),
 ('library', 3),
 ('programming', 2),
 ('programmer', 2),
 ('started', 2),
 ('scratch', 2),
 ('first', 1)]

或者有少量的generatorsaka lazy looping：

def words_counter(sample_txt):
    without_dot_comma = (word.strip('.,') for word in sample_txt.split())
    longer_than_4 = (word.lower() for word in without_dot_comma if len(word) > 4)
    without_stop_words = (word for word in longer_than_4 if word not in stop_words)
    return Counter(without_stop_words).most_common(8)

>>> words_counter(txt)
[('python', 4),
 ('there’s', 3),
 ('library', 3),
 ('programming', 2),
 ('programmer', 2),
 ('started', 2),
 ('scratch', 2),
 ('first', 1)]

编辑： 将函数中的最后一行更改为：

return [(v, k) for k, v in Counter(filtered_words).most_common(8)]

相关问题更多 >

编程相关推荐

热门问题

热门文章