Python中的正则表达式:是否可以获取匹配、替换和最终字符串?

30 投票
2 回答
6875 浏览
提问于 2025-04-17 12:17

在进行正则表达式替换时,你需要提供三样东西:

  • 匹配的模式
  • 替换的内容
  • 原始字符串

正则引擎会找到三样我感兴趣的东西:

  • 匹配到的 字符串
  • 替换用的 字符串
  • 最终处理后的字符串

使用 re.sub 时,最终的字符串就是返回的结果。但我能不能获取到另外两个东西,也就是匹配到的字符串和替换用的字符串呢?

这里有个例子:

orig = "This is the original string."
matchpat = "(orig.*?l)"
replacepat = "not the \\1"

final = re.sub(matchpat, replacepat, orig)
print(final)
# This is the not the original string

匹配的字符串是 "original",而替换的字符串是 "not the original"。有没有办法获取到这两个呢?我正在写一个脚本,用来在很多文件中搜索和替换,我希望它能打印出找到的内容和替换的内容,而不是打印整行。

2 个回答

10

我查看了文档,发现可以把一个函数的引用传递给 re.sub

import re

def re_sub_verbose(pattern, replace, string):
  def substitute(match):
    print 'Matched:', match.group(0)
    print 'Replacing with:', match.expand(replace)

    return match.expand(replace)

  result = re.sub(pattern, substitute, string)
  print 'Final string:', result

  return result

当我运行 re_sub_verbose("(orig.*?l)", "not the \\1", "This is the original string.") 时,得到了这个输出:

Matched: original
Replacing with: not the original
This is the not the original string.
29

class Replacement(object):

    def __init__(self, replacement):
        self.replacement = replacement
        self.matched = None
        self.replaced = None

    def __call__(self, match):
        self.matched = match.group(0)
        self.replaced = match.expand(self.replacement)
        return self.replaced

>>> repl = Replacement('not the \\1')
>>> re.sub('(orig.*?l)', repl, 'This is the original string.')
    'This is the not the original string.'
>>> repl.matched
    'original'
>>> repl.replaced
    'not the original'

编辑:正如@F.J指出的,上面的内容只会记住最后一次匹配和替换。这一版本可以处理多个出现的情况:

class Replacement(object):

    def __init__(self, replacement):
        self.replacement = replacement
        self.occurrences = []

    def __call__(self, match):
        matched = match.group(0)
        replaced = match.expand(self.replacement)
        self.occurrences.append((matched, replaced))
        return replaced

>>> repl = Replacement('[\\1]')
>>> re.sub('\s(\d)', repl, '1 2 3')
    '1[2][3]'

>>> for matched, replaced in repl.occurrences:
   ....:     print matched, '=>', replaced
   ....:     
 2 => [2]
 3 => [3]

撰写回答