Regex replace(在Python中)-一种更简单的方法?

2024-04-26 10:16:24 发布

您现在位置:Python中文网/ 问答频道 /正文

每当我想替换一段较大文本的一部分时,我总是要做如下事情:

"(?P<start>some_pattern)(?P<replace>foo)(?P<end>end)"

然后将start组与replace的新数据连接,然后连接end组。

有更好的方法吗?


Tags: 数据方法文本foosome事情startreplace
3条回答

简而言之,您不能在lookbehinds中使用使用Python的re模块的变宽模式。无法更改:

>>> import re
>>> re.sub("(?<=foo)bar(?=baz)", "quux", "foobarbaz")
'fooquuxbaz'
>>> re.sub("(?<=fo+)bar(?=baz)", "quux", "foobarbaz")

Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    re.sub("(?<=fo+)bar(?=baz)", "quux", string)
  File "C:\Development\Python25\lib\re.py", line 150, in sub
    return _compile(pattern, 0).sub(repl, string, count)
  File "C:\Development\Python25\lib\re.py", line 241, in _compile
    raise error, v # invalid expression
error: look-behind requires fixed-width pattern

这意味着您需要解决它,最简单的解决方案与您现在所做的非常相似:

>>> re.sub("(fo+)bar(?=baz)", "\\1quux", "foobarbaz")
'fooquuxbaz'
>>>
>>> # If you need to turn this into a callable function:
>>> def replace(start, replace, end, replacement, search):
        return re.sub("(" + re.escape(start) + ")" + re.escape(replace) + "(?=" + re.escape + ")", "\\1" + re.escape(replacement), search)

这并没有lookbehind解决方案的优雅之处,但它仍然是一个非常清晰、直接的单行代码。如果你看看an expert has to say on the matter(他说的是JavaScript,它完全没有lookbehind,但是许多原则是相同的),你会发现他最简单的解决方案看起来很像这个。

>>> import re
>>> s = "start foo end"
>>> s = re.sub("foo", "replaced", s)
>>> s
'start replaced end'
>>> s = re.sub("(?<= )(.+)(?= )", lambda m: "can use a callable for the %s text too" % m.group(1), s)
>>> s
'start can use a callable for the replaced text too end'
>>> help(re.sub)
Help on function sub in module re:

sub(pattern, repl, string, count=0)
    Return the string obtained by replacing the leftmost
    non-overlapping occurrences of the pattern in string by the
    replacement repl.  repl can be either a string or a callable;
    if a callable, it's passed the match object and must return
    a replacement string to be used.

在Pythonre documentation中查找lookaheads(?=...)和lookbehinds(?<=...)——我很确定它们是您想要的。它们匹配字符串,但不会“消耗”它们匹配的字符串位。

相关问题 更多 >