Python返回频率最高的前5个单词

2024-05-21 07:45:07 发布

您现在位置:Python中文网/ 问答频道 /正文

正如标题所说,我需要编写一个代码,返回5个单词(来自输入字符串)的列表,这些单词的频率最高。到目前为止,我得到的是:

from collections import defaultdict

def top5_words(text):
  tally = defaultdict(int)
  words = text.split()

  for word in words:
    if word in tally:
      tally[word] += 1
    else:
      tally[word] = 1

  answer = sorted(tally, key=tally.get, reverse = True)

  return(answer)

例如,如果您输入:top5_words(“one one was a racehorse two two is one too”),它应该返回:[“one”,“two”,“was”,“a”,“racehorse”]但是它会返回:['one'、'was'、'two'、'racehorse'、'too'、'a']-有人知道这是为什么吗?在

编辑:

这就是我现在所得到的,多亏了阿南德·库马尔:

^{pr2}$

Tags: textanswerin标题单词onewordtoo
1条回答
网友
1楼 · 发布于 2024-05-21 07:45:07

您应该使用^{},然后可以使用它的方法^{}。示例-

import collections
def top5_words(text):
    counts = collections.Counter(text.split())
    return counts.most_common(5)

请注意,上面返回5个元组的列表,在每个元组中,第一个元素是实际单词,第二个元素是该单词的计数。在

演示-

^{pr2}$

如果只需要元素而不是计数,那么还可以使用列表理解来获取这些信息。示例-

import collections
def top5_words(text):
    counts = collections.Counter(text.split())
    return [elem for elem, _ in counts.most_common(5)]

演示-

>>> import collections
>>> def top5_words(text):
...     counts = collections.Counter(text.split())
...     return [elem for elem, _ in counts.most_common(5)]
...
>>> top5_words("""As the title says, I need to write a code that returns a list of 5 words (from an input string) that have the highest frequency. This is what I have so far""")
['that', 'a', 'I', 'the', 'have']

对于评论中的新要求-

it seems there's still an issue when it comes to words with the same frequency, how would I get it to sort same frequency words alphabetically?

您可以首先获得所有单词及其计数的列表,然后使用sorted,这样排序首先对计数进行排序,然后对元素本身进行排序(因此,当计数相同时,它按字典顺序排序)。示例-

import collections
def top5_words(text):
    counts = collections.Counter(text.lower().split())
    return [elem for elem, _ in sorted(counts.most_common(),key=lambda x:(-x[1], x[0]))[:5]]

演示-

>>> import collections
>>> def top5_words(text):
...     counts = collections.Counter(text.lower().split())
...     return [elem for elem, _ in sorted(counts.most_common(),key=lambda x:(-x[1], x[0]))[:5]]
...
>>> top5_words("""As the title says, I need to write a code that returns a list of 5 words (from an input string) that have the highest frequency. This is what I have so far""")
['a', 'have', 'i', 'that', 'the']

相关问题 更多 >