有没有办法让re.sub报告它所做的每个替换？

2 投票

2 回答

48 浏览

提问于 2025-04-14 15:21

简而言之：如何让 re.sub 打印出它所做的替换，包括使用分组时的替换？

就像有一个详细模式一样，能不能让 re.sub 每次进行替换时都打印出一条消息？这对测试多行 re.sub 如何与大文本互动会非常有帮助。

我想出了一个简单替换的变通办法，利用了 repl 参数可以是一个函数这一点：

import re

def replacer(text, verbose=False):
    def repl(matchobj, replacement):
        if verbose:
            print(f"Replacing {matchobj.group()} with {replacement}...")
        return replacement
    text = re.sub(r"[A-Z]+", lambda m: repl(m, "CAPS"), text)
    text = re.sub(r"\d+", lambda m: repl(m, "NUMBER"), text)
    return text

replacer("this is a 123 TEST 456", True)

# Log:
#   Replacing TEST with CAPS...
#   Replacing 123 with NUMBER...
#   Replacing 456 with NUMBER...

不过，这个方法对分组不起作用——似乎 re.sub 会自动处理 repl 的返回值：

def replacer2(text, verbose=False):
    def repl(matchobj, replacement):
        if verbose:
            print(f"Replacing {matchobj.group()} with {replacement}...")
        return replacement
    text = re.sub(r"([A-Z]+)(\d+)", lambda m: repl(m, r"\2\1"), text)
    return text

replacer2("ABC123", verbose=True) # returns r"\2\1"

# Log:
#   Replacing ABC123 with \2\1...

当然，可以写一个更复杂的 repl 函数，实际检查 replacement 中的分组，但那样的话，解决方案似乎就太复杂了，毕竟只是想让 re.sub 报告替换情况。另一个可能的解决办法是直接使用 re.search，先报告这个结果，然后再用 re.sub 进行替换，可能还可以使用 Pattern.sub 变体来指定 pos 和 endpos，这样就不用让 sub 函数再次搜索整个字符串。肯定还有比这两种方法更好的办法吧？

正则表达式文本处理调试信息多行文本函数参数替换操作分组复杂函数

2 个回答

你可以用一个正则表达式来匹配大写字母和数字，写成 [A-Z]+|\d+。然后在回调函数里，根据实际匹配到的内容做一个选择，并记录下来。

下面是一个示例代码：

def repl(m):
    if re.search(r'^[A-Z]+$', m.group()):
        print("Replacing " + m.group() + " with CAPS")
        return "CAPS"
    else:
        print("Replacing " + m.group() + " with NUMBER")
        return "NUMBER"

text = "this is a 123 TEST 456"
text = re.sub(r'[A-Z]+|\d+', lambda m: repl(m), text)
print(text)

这段代码会输出：

Replacing 123 with NUMBER
Replacing TEST with CAPS
Replacing 456 with NUMBER
this is a NUMBER CAPS NUMBER

回答于 2025-04-14 由 Python大师

分享举报

使用 matchobj.expand(replacement) 这个方法，它会处理替换字符串并进行替换：

import re

def replacer2(text, verbose=False):
    def repl(matchobj, replacement):
        result = matchobj.expand(replacement)
        if verbose:
            print(f"Replacing {matchobj.group()} with {result}...")
        return result
    text = re.sub(r"([A-Z]+)(\d+)", lambda m: repl(m, r"\2\1"), text)
    return text

print(replacer2("ABC123", verbose=True)

输出结果：

Replacing ABC123 with 123ABC...
123ABC

这是一个通用的例子，它扩展了 re.sub 方法，增加了一个详细选项，并允许在替换函数中使用分组模式：

import re

def sub2(pattern, repl, string, count=0, flags=0, verbose=False):
    def helper(match, repl):
        result = match.expand(repl(match) if callable(repl) else repl)
        if verbose:
            print(f'offset {match.start()}: {match.group()!r} -> {result!r}')
        return result
    return re.sub(pattern, lambda m: helper(m, repl), string, count, flags)

# replace three digits with their reverse
print(sub2(r'(\d)(\d)(\d)', r'\3\2\1', 'abc123def45ghi789', verbose=True))
# replace three digits with their reverse, and two digits wrap with parentheses
print(sub2(r'(\d)(\d)(\d)?',
           lambda m: r'(\1\2)' if m.group(3) is None else r'\3\2\1', 
           'abc123def45ghi789', verbose=True))

输出结果：

offset 3: '123' -> '321'
offset 14: '789' -> '987'
abc321def45ghi987
offset 3: '123' -> '321'
offset 9: '45' -> '(45)'
offset 14: '789' -> '987'
abc321def(45)ghi987

回答于 2025-04-14 由 Python大师

分享举报

有没有办法让re.sub报告它所做的每个替换？

2 个回答

撰写回答