当搜索的词位于定义的表达式中时，python会绕过re.finditer匹配

find_list = ['name', 'small'] scape_list = ['small software', 'company name'] text = "My name is Klaus and my middle name is Smith. I work for a small company. The company name is Small Software. Small Software sells Software Name." final_list = [] for word in find_list: s = r'\W{}\W'.format(word) matches = re.finditer(s, text, (re.MULTILINE | re.IGNORECASE)) for word_ in matches: final_list.append(word_.group(0))

1条回答

网友

1楼 · 发布于 2024-04-20 10:35:45

您可以使用正则表达式捕获find_list单词前后的单词，并检查scape_list中是否不存在这两个组合。我在更改代码的地方添加了注释。（最好将scape_列表更改为set，如果它将来可以变大的话）

find_list = ['name', 'small']
scape_list = ['small software', 'company name']

text = "My name is Klaus and my middle name is Smith. I work for a small company. The company name is Small Software. Small Software sells Software Name."

final_list = []

for word in find_list:
    
    s = r'(\w*\W)({})(\W\w*)'.format(word) # change the regex to capture adjacent words
    matches = re.finditer(s, text, (re.MULTILINE | re.IGNORECASE))

    for word_ in matches:
        if ((word_.group(1) + word_.group(2)).strip().lower() not in scape_list
            and (word_.group(2) + word_.group(3)).strip().lower() not in scape_list): # added this condition
            final_list.append(word_.group(2)) # changed here

final_list
['name', 'name', 'Name', 'small']

相关问题更多 >

编程相关推荐

热门问题

热门文章