有没有办法让re.sub报告它所做的每个替换?
简而言之:如何让 re.sub
打印出它所做的替换,包括使用分组时的替换?
就像有一个详细模式一样,能不能让 re.sub
每次进行替换时都打印出一条消息?这对测试多行 re.sub
如何与大文本互动会非常有帮助。
我想出了一个简单替换的变通办法,利用了 repl
参数可以是一个函数这一点:
import re
def replacer(text, verbose=False):
def repl(matchobj, replacement):
if verbose:
print(f"Replacing {matchobj.group()} with {replacement}...")
return replacement
text = re.sub(r"[A-Z]+", lambda m: repl(m, "CAPS"), text)
text = re.sub(r"\d+", lambda m: repl(m, "NUMBER"), text)
return text
replacer("this is a 123 TEST 456", True)
# Log:
# Replacing TEST with CAPS...
# Replacing 123 with NUMBER...
# Replacing 456 with NUMBER...
不过,这个方法对分组不起作用——似乎 re.sub
会自动处理 repl
的返回值:
def replacer2(text, verbose=False):
def repl(matchobj, replacement):
if verbose:
print(f"Replacing {matchobj.group()} with {replacement}...")
return replacement
text = re.sub(r"([A-Z]+)(\d+)", lambda m: repl(m, r"\2\1"), text)
return text
replacer2("ABC123", verbose=True) # returns r"\2\1"
# Log:
# Replacing ABC123 with \2\1...
当然,可以写一个更复杂的 repl
函数,实际检查 replacement
中的分组,但那样的话,解决方案似乎就太复杂了,毕竟只是想让 re.sub
报告替换情况。另一个可能的解决办法是直接使用 re.search
,先报告这个结果,然后再用 re.sub
进行替换,可能还可以使用 Pattern.sub
变体来指定 pos
和 endpos
,这样就不用让 sub
函数再次搜索整个字符串。肯定还有比这两种方法更好的办法吧?
2 个回答
0
你可以用一个正则表达式来匹配大写字母和数字,写成 [A-Z]+|\d+
。然后在回调函数里,根据实际匹配到的内容做一个选择,并记录下来。
下面是一个示例代码:
def repl(m):
if re.search(r'^[A-Z]+$', m.group()):
print("Replacing " + m.group() + " with CAPS")
return "CAPS"
else:
print("Replacing " + m.group() + " with NUMBER")
return "NUMBER"
text = "this is a 123 TEST 456"
text = re.sub(r'[A-Z]+|\d+', lambda m: repl(m), text)
print(text)
这段代码会输出:
Replacing 123 with NUMBER
Replacing TEST with CAPS
Replacing 456 with NUMBER
this is a NUMBER CAPS NUMBER
3
使用 matchobj.expand(replacement)
这个方法,它会处理替换字符串并进行替换:
import re
def replacer2(text, verbose=False):
def repl(matchobj, replacement):
result = matchobj.expand(replacement)
if verbose:
print(f"Replacing {matchobj.group()} with {result}...")
return result
text = re.sub(r"([A-Z]+)(\d+)", lambda m: repl(m, r"\2\1"), text)
return text
print(replacer2("ABC123", verbose=True)
输出结果:
Replacing ABC123 with 123ABC...
123ABC
这是一个通用的例子,它扩展了 re.sub
方法,增加了一个详细选项,并允许在替换函数中使用分组模式:
import re
def sub2(pattern, repl, string, count=0, flags=0, verbose=False):
def helper(match, repl):
result = match.expand(repl(match) if callable(repl) else repl)
if verbose:
print(f'offset {match.start()}: {match.group()!r} -> {result!r}')
return result
return re.sub(pattern, lambda m: helper(m, repl), string, count, flags)
# replace three digits with their reverse
print(sub2(r'(\d)(\d)(\d)', r'\3\2\1', 'abc123def45ghi789', verbose=True))
# replace three digits with their reverse, and two digits wrap with parentheses
print(sub2(r'(\d)(\d)(\d)?',
lambda m: r'(\1\2)' if m.group(3) is None else r'\3\2\1',
'abc123def45ghi789', verbose=True))
输出结果:
offset 3: '123' -> '321'
offset 14: '789' -> '987'
abc321def45ghi987
offset 3: '123' -> '321'
offset 9: '45' -> '(45)'
offset 14: '789' -> '987'
abc321def(45)ghi987