Python - 单行与多行正则表达式

6 投票

1 回答

9223 浏览

提问于 2025-04-17 02:51

考虑以下文本模式，

#目标：处理报告的时间戳，比如2011-09-21 15:45:00，还有成功统计行中的前两个数据，比如1438 1439

input_text = '''
# Process_Name     ( 23387) Report at 2011-09-21 15:45:00.001    Type:  Periodic    #\n
some line 1\n
some line 2\n
some other lines\n
succ. statistics |     1438     1439  99 |   3782245    3797376  99 |\n
some lines\n
Process_Name     ( 23387) Report at 2011-09-21 15:50:00.001    Type:  Periodic    #\n
some line 1\n
some line 2\n
some other lines\n
succ. statistics |     1436     1440  99 |   3782459    3797523  99 |\n
repeat the pattern several hundred times...
'''

我在逐行处理时让它工作了，

def parse_file(file_handler, patterns):

    results = []
    for line in file_handler:
        for key in patterns.iterkeys():
            result = re.match(patterns[key], line)
            if result:
                results.append( result )

return results

patterns = {
    'report_date_time': re.compile('^# Process_Name\s*\(\s*\d+\) Report at (.*)\.[0-9]   {3}\s+Type:\s*Periodic\s*#\s*.*$'),
    'serv_term_stats': re.compile('^succ. statistics \|\s+(\d+)\s+   (\d+)+\s+\d+\s+\|\s+\d+\s+\d+\s+\d+\s+\|\s*$'),
    }
results = parse_file(fh, patterns)

[('2011-09-21 15:40:00',),
('1425', '1428'),
('2011-09-21 15:45:00',),
('1438', '1439')]

但我的目标是输出一个元组的列表，

[('2011-09-21 15:40:00','1425', '1428'),
('2011-09-21 15:45:00', '1438', '1439')]

我尝试了几种组合，使用初始模式和它们之间的懒惰量词，但就是搞不定如何用多行正则表达式来捕捉这些模式

# .+?   Lazy quantifier "match as few characters as possible (all characters allowed) until reaching the next expression"
pattern = '# Process_Name\s*\(\s*\d+\) Report at (.*)\.[0-9]{3}\s+Type:\s*Periodic.*?succ. statistics) \|\s+(\d+)\s+(\d+)+\s+\d+\s+\|\s+\d+\s+\d+\s+\d+\s+\|\s'
regex = re.compile(pattern, flags=re.MULTILINE)

data = file_handler.read()    
for match in regex.finditer(data):
    results = match.groups()

我该怎么做呢？

正则表达式数据提取元组列表多行模式文本模式时间戳处理懒惰量词

1 个回答

使用 re.DOTALL 选项，这样 . 就可以匹配任何字符，包括换行符：

import re

data = '''
# Process_Name     ( 23387) Report at 2011-09-21 15:45:00.001    Type:  Periodic    #\n
some line 1\n
some line 2\n
some other lines\n
succ. statistics |     1438     1439  99 |   3782245    3797376  99 |\n
some lines\n
repeat the pattern several hundred times...
'''

pattern = r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}).*?succ. statistics\s+\|\s+(\d+)\s+(\d+)'
regex = re.compile(pattern, flags=re.MULTILINE|re.DOTALL)

for match in regex.finditer(data):
    results = match.groups()
    print(results)

    # ('2011-09-21', '1438', '1439')

回答于 2025-04-17 由 Python大师

分享举报

Python - 单行与多行正则表达式

1 个回答

撰写回答