Python我的频率函数是无效的

2024-04-24 06:23:10 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在编写一个函数,返回单词列表中出现次数最多的单词出现的次数。你知道吗

def max_frequency(words):
    """Returns the number of times appeared of the word that
    appeared the most in a list of words."""

    words_set = set(words)
    words_list = words
    word_dict = {}

    for i in words_set:
        count = []
        for j in words_list:
            if i == j:
                count.append(1)
        word_dict[i] = len(count)

    result_num = 0
    for _, value in word_dict.items():
        if value > result_num:
            result_num = value
    return result_num

例如:

words = ["Happy", "Happy", "Happy", "Duck", "Duck"]
answer = max_frequency(words)
print(answer)

3

但是这个函数在处理一个列表中的大量单词时速度很慢,例如,一个250000个单词的列表需要4分钟来显示输出。所以我正在寻求帮助来调整这个。你知道吗

我不想进口任何东西。你知道吗


Tags: ofthein列表forvaluecountresult
3条回答

虽然我完全同意与您的声明相关的评论,但我不想导入任何内容,我发现您的问题很有趣,所以让我们试试。你知道吗

您不需要构建set。直接用words就行了。你知道吗

words = words = ["Happy", "Happy", "Happy", "Duck", "Duck"]
words_dict = {}

for w in words:
    if w in words_dict:
        words_dict[w] += 1
    else:
        words_dict[w] = 1

result_num = max(words_dict.values())

print(result_num)
# 3

为了防止对每个唯一的单词多次传递列表,您可以简单地对其进行一次迭代,并为每个计数更新字典值。你知道吗

counts = {}
for word in words:
    counts[word] = counts.get(word, 0) + 1

输出

>>> print(max(counts.values()))
3

也就是说,使用defaultdict而不是get或者使用^{}可以做得更好。。。如果您有选择的话,在Python中限制自己不导入永远不是一个好主意。你知道吗

例如,使用collections.Counter

from collections import Counter
counter = Counter(words)
most_common = counter.most_common(1)

你可以试试这个代码,速度快了760%。你知道吗

def max_frequency(words):
    """Returns the number of times appeared of the word that
    appeared the most in a list of words."""

    count_dict = {}
    max = 0

    for word in words:
        current_count = 0

        if word in count_dict:
            current_count = count_dict[word] = count_dict[word] + 1
        else:
            current_count = count_dict[word] = 1

        if current_count > max:
            max = current_count

    return max

相关问题 更多 >