文本中最常见的n个单词

2条回答

网友

1楼 · 编辑于 2024-05-16 13:03:28

这很棘手，但我为您解决了，我使用空格来检测elem是否包含3个以上的单词：-）因为如果elem包含3个单词，那么它必须是2个空格：-）在这种情况下，只有包含2个单词的elem才会返回

l = ["hello world", "good night world", "good morning sunshine", "wassap babe"]
for elem in l:

   if elem.count(" ") == 1:
      print(elem)

输出

hello world
wassap babe

网友

2楼 · 编辑于 2024-05-16 13:03:28

我建议如下使用Counter和combinations

from collections import Counter
from itertools import combinations, chain

text = ['Lion Monkey Elephant Weed', 'Tiger Elephant Lion Water Grass', 'Lion Weed Markov Elephant Monkey Fine', 'Guard Elephant Weed Fortune Wolf']


def count_combinations(text, n_words, n_most_common=None):
    count = []
    for t in text:
        words = t.split()
        combos = combinations(words, n_words)
        count.append([" & ".join(sorted(c)) for c in combos])
    return dict(Counter(sorted(list(chain(*count)))).most_common(n_most_common))

count_combinations(text, 2)

相关问题更多 >

编程相关推荐

热门问题

热门文章

文本中最常见的n个单词

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >