统计句子中的不同单词

0 投票

4 回答

55 浏览

提问于 2025-04-14 17:23

我想知道有没有一种“好看的方法”来计算一句话中不同的表达方式。
比如说，我想知道在同一句话中提到了多少个政治家。


>>> import re
>>> countc=0
>>> reg01 = re.compile('Trump')
>>> reg02 = re.compile('Biden')
>>> frase = "Trump won the republican primaries"
>>> if re.match(reg01, frase):
...     countc+=1

我正在做类似的事情，而且它能正常工作。不过在某些情况下，我需要处理很多单词，我希望能有一种更优雅的方法。

数据处理自然语言处理文本分析词频统计文本挖掘

4 个回答

我不知道这是不是最好的方法，但我会使用列表。如果你处理的数据量不大，这样做应该就足够了。你还应该对数据进行更多的清理，因为在现实世界中，可能会出现像“特朗普”、“特朗普的”等这样的情况。

def check_for_word(sentens:str) -> dict:

     sentens = sentens.replace(".", "")
     sentens = sentens.replace(",", "")
     sentens = sentens.split(" ")
     trump_count = len([x for x in sentens if x.lower() == 'trump'])
     biden_count = len([x for x in sentens if x.lower() == 'biden'])

     return {'trump': trump_count, 'biden': biden_count}

回答于 2025-04-14 由 Python大师

分享举报

我会用一个叫做 collections.Counter() 的工具作为起点。

我们先从这里开始：

import collections
import re

interesting_words = ["Trump", "Biden"]
phrase = "Trump won the republican primaries but not and Biden."
word_counts = collections.Counter(re.split(r"\W", phrase))
total = sum(word_counts.get(word) for word in interesting_words)
print(total)

回答于 2025-04-14 由 Python大师

分享举报

首先，先准备一个你想要查找的单词列表。然后，使用一个循环来逐个查找这些单词。

import re
frase = "Trump won the republican primaries"

words = ['Trump','word1','word2','word3','word4']
for x in words:
    reg01 = re.compile(x)
    if re.match(reg01, frase):
        print(f'{x} is there in the sentence')

回答于 2025-04-14 由 Python大师

分享举报

统计句子中的不同单词

4 个回答

撰写回答