Python中的Regex：从具有重复相似版本的文本中提取多行部分

(A lot of information) time: 150 C-FXY -- information --- E-END (A lot of information) time: 5000 C-FXY **--- INFORMATION I WANT TO EXTRACT ---** E-END (A lot of information) time: 13000 C-FXY -- information --- E-END (A lot of information)

2条回答

网友

1楼 · 编辑于 2024-04-25 07:13:15

导致错误的原因是您的regex在time部分和C-FXY之间包含一个贪婪的.*。所以它吃掉了最后一组的所有东西。在

在这里使用非贪婪版本就足够了：

text_part = re.search(r'time.*'+time_step+'.*?C-FXY(.*?)E-END', text, re.DOTALL).group(1)

无论如何，我不会在这里使用对整个文件的多行搜索，但我只会逐行读取文件，直到time: 5000，然后再到{}一个，从那里存储任何内容到C-END一个，并在那里结束处理。在

网友

2楼 · 编辑于 2024-04-25 07:13:15

可以使用以下代码解决它：

import re

text = """(A lot of information)

time:    150

C-FXY

  information  -

E-END

(A lot of information)

time:   5000

C-FXY

** - INFORMATION I WANT TO EXTRACT  -**

E-END

(A lot of information)

time:  13000

C-FXY

  information  -

E-END

(A lot of information)"""

pattern = re.compile(r"C-FXY(.*?)E-END")

results = re.findall(r"C-FXY(.*?)E-END", text, re.DOTALL)

现在，如果打印results：

^{pr2}$

输出将是：

Resultado 0:
'

  information  -

'
Resultado 1:
'

** - INFORMATION I WANT TO EXTRACT  -**

'
Resultado 2:
'

  information  -

'

相关问题更多 >

编程相关推荐

热门问题

热门文章