Python正则表达式匹配具有重复辅音的单词

# Maybe the \b word markers don't work how I think? r'.*[b-z&&[^eiou]]{2}' -> # still nothing # Okay lets just try to match something in between anything r'.*[b-z&&[^eiou]].*' -> # nope # Since its words, maybe I should be more explicit. r'[a-z]*[b-z&&[^eiou]][a-z]*' -> # still nope # Decided to go back to grouping. r'([b-z&&[^eiou]])(\1)' # I realize set difference may be the issue # I saw someone (on SO) use set difference claiming it works # but I gave up on it... # OKAY getting close r'(([b-df-hj-np-tv-xz])(\2))' -> [('ll', 'l', 'l'), ...] # Trying the the previous ones without set difference r'\b(.*(?:[b-df-hj-np-tv-xz]{3}).*)\b' -> # returned everything (all words) # Here I realize I need a non-greedy leading pattern (.* -> .*?) r'\b(.*?(?:[b-df-hj-np-tv-xz]{3}).*)\b' -> # still everything # Maybe I need the comma in {3,} to get anything 3 or more r'\b(.*?(?:[b-df-hj-np-tv-xz]{3,}).*)\b' -> # still everything # okay I'll try a 1 line test just in case r'\b(.*?([b-df-hj-np-tv-xz])(\2{3,}).*)\b' # Using 'asdfdffff' -> [('asdfdffff', 'f', 'fff')] # Using dictionary -> [] # WAIT WHAT?!

1条回答

网友

1楼 · 发布于 2024-06-07 15:09:43

以下是一些需要考虑的问题：

使用re.findall获得所有结果，而不是re.match（它只搜索1个匹配项，并且只在字符串开始处搜索）。
[b-z&&[^eiou]]是一个Java/ICU regex，Pythonre不支持此语法。在Python中，可以重新定义范围以跳过元音，也可以使用(?![eiou])[b-z]。
为了避免带有re.findall的元组中的“额外”值，不要使用捕捉组。如果需要反向引用，请使用re.finditer而不是re.findall，并访问每个匹配项的.group()。

回到问题上来，如何使用backreference并仍然得到整个匹配，这里有一个working demo：

import re
s = """someword
sommmmmeword
someworddddd
sooooomeword"""
res =[x.group() for x in re.finditer(r"\w*([b-df-hj-np-tv-xz])\1\w*", s)]
print(res)
# => ['sommmmmeword', 'someworddddd']

相关问题更多 >

编程相关推荐

热门问题

热门文章