如何计算单词出现次数而不被限制为仅精确匹配

#!/usr/bin/env python filename = "/path/to/file.txt" number_of_words = 0 search_string = "Hello" with open(filename, 'r') as file: for line in file: words = line.split() for i in words: if (i == search_string): number_of_words += 1 print("Number of words in " + filename + " is: " + str(number_of_words))

3条回答

网友

1楼 · 编辑于 2024-06-02 09:09:43

您可以使用“集合”模块中的regex和Counter：

txt = '''Someone says; Hello; Someone responded Hello back
Someone again said; Hello; No response
Someone again said; Hello waiting for response'''

import re
from collections import Counter
from pprint import pprint

c = Counter()
re.sub(r'\b\w+\b', lambda r: c.update((r.group(0), )), txt)
pprint(c)

印刷品：

Counter({'Someone': 4,
         'Hello': 4,
         'again': 2,
         'said': 2,
         'response': 2,
         'says': 1,
         'responded': 1,
         'back': 1,
         'No': 1,
         'waiting': 1,
         'for': 1})

网友

2楼 · 编辑于 2024-06-02 09:09:43

你可以用正则表达式来找到答案。你知道吗

import re
filename = "/path/to/file.txt"

number_of_words = 0
search_string = "Hello"


with open(filename, 'r') as file:
    for line in file:
        words = line.split()
        for i in words:
            b = re.search(r'\bHello;?\b', i)
            if b:
                number_of_words += 1

print("Number of words in " + filename + " is: " + str(number_of_words))

这将检查文件中是否有“Hello”或“Hello；”。您可以扩展regex以满足任何其他需要（例如小写）。你知道吗

它将忽略诸如“Helloing”之类的内容，这里的其他示例可能会忽略这些内容。你知道吗

如果你不想用正则表达式。。。您可以检查去掉最后一个字母是否匹配如下：

filename = "/path/to/file.txt"

number_of_words = 0
search_string = "Hello"

with open(filename, 'r') as file:
    for line in file:
        words = line.split()
        for i in words:
            if (i == search_string) or (i[:-1] == search_string and i[-1] == ';'):
                number_of_words += 1

print("Number of words in " + filename + " is: " + str(number_of_words))

网友

3楼 · 编辑于 2024-06-02 09:09:43

正则表达式将是一个更好的工具，因为你想忽略标点符号。它可以通过巧妙的过滤和.count()方法来完成，但这更简单：

import re
...
search_string = "Hello"
with open(filename, 'r') as file:
    filetext = file.read()
occurrences = len(re.findall(search_string, filetext))

print("Number of words in " + filename + " is: " + str(occurrences))

如果希望不区分大小写，可以相应地更改search_string：

search_string = r"[Hh]ello"

或者，如果要显式地使用单词Hello，而不是aHello或Hellon，则可以在前后匹配\b字符（有关更多有趣的技巧，请参见the documentation）：

search_string = r"\bHello\b"

相关问题更多 >

编程相关推荐

热门问题

热门文章