有没有办法让re.sub报告它所做的每个替换?

2 投票
2 回答
48 浏览
提问于 2025-04-14 15:21

简而言之:如何让 re.sub 打印出它所做的替换,包括使用分组时的替换?

就像有一个详细模式一样,能不能让 re.sub 每次进行替换时都打印出一条消息?这对测试多行 re.sub 如何与大文本互动会非常有帮助。

我想出了一个简单替换的变通办法,利用了 repl 参数可以是一个函数这一点:

import re

def replacer(text, verbose=False):
    def repl(matchobj, replacement):
        if verbose:
            print(f"Replacing {matchobj.group()} with {replacement}...")
        return replacement
    text = re.sub(r"[A-Z]+", lambda m: repl(m, "CAPS"), text)
    text = re.sub(r"\d+", lambda m: repl(m, "NUMBER"), text)
    return text

replacer("this is a 123 TEST 456", True)

# Log:
#   Replacing TEST with CAPS...
#   Replacing 123 with NUMBER...
#   Replacing 456 with NUMBER...

不过,这个方法对分组不起作用——似乎 re.sub 会自动处理 repl 的返回值:

def replacer2(text, verbose=False):
    def repl(matchobj, replacement):
        if verbose:
            print(f"Replacing {matchobj.group()} with {replacement}...")
        return replacement
    text = re.sub(r"([A-Z]+)(\d+)", lambda m: repl(m, r"\2\1"), text)
    return text

replacer2("ABC123", verbose=True) # returns r"\2\1"

# Log:
#   Replacing ABC123 with \2\1...

当然,可以写一个更复杂的 repl 函数,实际检查 replacement 中的分组,但那样的话,解决方案似乎就太复杂了,毕竟只是想让 re.sub 报告替换情况。另一个可能的解决办法是直接使用 re.search,先报告这个结果,然后再用 re.sub 进行替换,可能还可以使用 Pattern.sub 变体来指定 posendpos,这样就不用让 sub 函数再次搜索整个字符串。肯定还有比这两种方法更好的办法吧?

2 个回答

0

你可以用一个正则表达式来匹配大写字母和数字,写成 [A-Z]+|\d+。然后在回调函数里,根据实际匹配到的内容做一个选择,并记录下来。

下面是一个示例代码:

def repl(m):
    if re.search(r'^[A-Z]+$', m.group()):
        print("Replacing " + m.group() + " with CAPS")
        return "CAPS"
    else:
        print("Replacing " + m.group() + " with NUMBER")
        return "NUMBER"

text = "this is a 123 TEST 456"
text = re.sub(r'[A-Z]+|\d+', lambda m: repl(m), text)
print(text)

这段代码会输出:

Replacing 123 with NUMBER
Replacing TEST with CAPS
Replacing 456 with NUMBER
this is a NUMBER CAPS NUMBER
3

使用 matchobj.expand(replacement) 这个方法,它会处理替换字符串并进行替换:

import re

def replacer2(text, verbose=False):
    def repl(matchobj, replacement):
        result = matchobj.expand(replacement)
        if verbose:
            print(f"Replacing {matchobj.group()} with {result}...")
        return result
    text = re.sub(r"([A-Z]+)(\d+)", lambda m: repl(m, r"\2\1"), text)
    return text

print(replacer2("ABC123", verbose=True)

输出结果:

Replacing ABC123 with 123ABC...
123ABC

这是一个通用的例子,它扩展了 re.sub 方法,增加了一个详细选项,并允许在替换函数中使用分组模式:

import re

def sub2(pattern, repl, string, count=0, flags=0, verbose=False):
    def helper(match, repl):
        result = match.expand(repl(match) if callable(repl) else repl)
        if verbose:
            print(f'offset {match.start()}: {match.group()!r} -> {result!r}')
        return result
    return re.sub(pattern, lambda m: helper(m, repl), string, count, flags)

# replace three digits with their reverse
print(sub2(r'(\d)(\d)(\d)', r'\3\2\1', 'abc123def45ghi789', verbose=True))
# replace three digits with their reverse, and two digits wrap with parentheses
print(sub2(r'(\d)(\d)(\d)?',
           lambda m: r'(\1\2)' if m.group(3) is None else r'\3\2\1', 
           'abc123def45ghi789', verbose=True))

输出结果:

offset 3: '123' -> '321'
offset 14: '789' -> '987'
abc321def45ghi987
offset 3: '123' -> '321'
offset 9: '45' -> '(45)'
offset 14: '789' -> '987'
abc321def(45)ghi987

撰写回答