在python regex中处理'++'符号

网友

1楼 · 编辑于 2024-04-26 00:33:09

除了re.escape()之外，还需要删除非字母数字字符前后的\b单词边界，否则匹配将失败。在

类似这样的东西（不太优雅，但我希望它能让人理解）：

import re
words = 'This is word of spy++'
wl = ['spy++','cry','fpp']
regobjs = []

for word in wl:
    eword = re.escape(word.lower())
    if eword[0].isalnum() or eword[0]=="_":
        eword = r"\b" + eword
    if eword[-1].isalnum() or eword[-1]=="_":
        eword = eword + r"\b"
    regobjs.append(re.compile(eword))

for regobj in regobjs:
    print re.search(regobj, words).group()

网友

2楼 · 编辑于 2024-04-26 00:33:09

当您的单词以字母、数字或下划线开头或结尾时，您希望使用\b，而不是{}，这意味着您不会选择{}，而是会选择{}甚至{}。如果你想避免最后一个，那么事情就会变得复杂得多。在

>>> def match_word(word):
    return re.compile("%s%s%s" % (
        "\\b" if word[0].isalnum() or word[0]=='_' else "\\B",
        re.escape(word.lower()),
        "\\b" if word[-1].isalnum() or word[-1]=='_' else "\\B"))

>>> text = 'This is word of spy++'
>>> wl = ['spy++','cry','fpp', 'word']
>>> for word in wl:
    match = re.search(match_word(word), text)
    if match:
        print(repr(match.group()))
    else:
        print("{} did not match".format(word))


'spy++'
cry did not match
fpp did not match
'word'

网友

3楼 · 编辑于 2024-04-26 00:33:09

萨西

你的问题很糟糕，没有表达你真正想要的。然后人们会试图从代码的内容中扣除您想要的内容，这会导致混乱。在

我假设你想找到列表中单词的出现，当它们完全孤立在一个字符串中时，也就是说，在每个出现的地方没有任何非空白。在

如果是这样，我建议使用以下代码中的regex模式：

import re

ss = 'spy++ This !spy++ is spy++! word of spy++'
print ss
print [mat.start() for mat in re.finditer('spy',ss)]
print


base = ('(?:(?<=[ \f\n\r\t\v])|(?<=\A))'
        '%s'
        '(?=[ \f\n\r\t\v]|\Z)')

for x in ['spy++','cry','fpp']:
    print x,[mat.start() for mat in re.finditer(base % re.escape(x),ss)]

结果

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

在python regex中处理'++'符号

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >