正则表达式：匹配正好三行

1 投票

2 回答

2306 浏览

提问于 2025-04-17 19:32

我想要匹配以下输入。怎么才能在不使用多行字符串的情况下，匹配一个特定次数的组呢？比如像这样（^(\d+) (.+)$){3}），但是这样不行。

sample_string = """Breakpoint 12 reached 
         90  good morning
     91  this is cool
     92  this is bananas
     """
pattern_for_continue = re.compile("""Breakpoint \s (\d+) \s reached \s (.+)$
                                 ^(\d+)\s+  (.+)\n
                                 ^(\d+)\s+  (.+)\n
                                 ^(\d+)\s+  (.+)\n
                                  """, re.M|re.VERBOSE)
matchobj = pattern_for_continue.match(sample_string)
    print matchobj.group(0)

正则表达式字符串处理多行匹配

2 个回答

代码

你需要的东西更像这样：

import re

sample_string = """Breakpoint 12 reached 
90  hey this is a great line
91  this is cool too
92  this is bananas
"""
pattern_for_continue = re.compile(r"""
    Breakpoint\s+(\d+)\s+reached\s+\n
    (\d+)  ([^\n]+?)\n
    (\d+)  ([^\n]+?)\n
    (\d+)  ([^\n]+?)\n
""", re.MULTILINE|re.VERBOSE)
matchobj = pattern_for_continue.match(sample_string)

for i in range(1, 8):
    print i, matchobj.group(i)
print "Entire match:"
print matchobj.group(0)

结果

1 12
2 90
3   hey this is a great line
4 91
5   this is cool too
6 92
7   this is bananas
Entire match:
0 Breakpoint 12 reached 
90  hey this is a great line
91  this is cool too
92  this is bananas

原因

re.VERBOSE 让你在正则表达式中必须明确空格。我通过将你的数据左对齐在多行字符串中部分解决了这个问题。我觉得这样做是合理的，因为在真实代码中你可能不会有这种情况；这很可能是因为在多行字符串中编辑时产生的。
你需要把 $ 替换成 \n。
你需要使用非贪婪匹配。

回答于 2025-04-17 由 Python大师

分享举报

你的表达式和示例有一系列问题：

你使用的VERBOSE模式让所有的空格都不匹配，所以你第一行数字周围的空格也被忽略了。你可以用\s或者[ ]来替代空格（后者只匹配字面上的空格，前者还可以匹配换行和制表符）。
你的输入示例在每行数字前都有空格，但你的示例模式要求数字必须在行的开头。你要么允许这些空格，要么修正你的输入示例。
最大的问题是，在一个重复组里面的捕获组（比如(\d+)放在一个后面有{3}的大组里）只会捕获最后一次匹配。你会得到92和this is bananas，而不是前面两行匹配的内容。

要解决这些问题，你必须明确地为三行重复这个模式。你可以用Python来实现这个重复：

linepattern =  r'[ ]* (\d+) [ ]+ ([^\n]+)\n'

pattern_for_continue = re.compile(r"""
    Breakpoint [ ]+ (\d+) [ ]+ reached [ ]+ ([^\n]*?)\n
    {}
""".format(linepattern * 3), re.MULTILINE|re.VERBOSE)

对于你的输入示例，这样做会返回：

>>> pattern_for_continue.match(sample_string).groups()
('12', '', '90', 'hey this is a great line', '91', 'this is cool too', '92', 'this is bananas')

如果你真的不想匹配额外三行数字前的空格，可以从linepattern中去掉第一个[ ]*模式。

回答于 2025-04-17 由 Python大师

分享举报

正则表达式：匹配正好三行

2 个回答

代码

结果

原因

撰写回答