正则表达式在多行上搜索文本

import fitz import re doc = fitz.open(r'file.pdf') text_list = [ ] for page in doc: text_list.append(page.getText()) #print(text_list[-1]) text_string = ' '.join(text_list) test_string = "Observations of Client Behavior: THIS IS THE DESIRED TEXT. Observations of Client's response to skill acquisition" #works for this test pat = r".*?Observations of Client Behavior: (.*) Observations of Client's response to skill acquisition*" match = re.search(pat, text_string) print(match.group(1).strip())

Observations of Client Behavior: Overall interfering behavior data trends are as followed: Aggression frequency has been low and stable at 0 occurrences for the past two consecutive sessions. Elopement frequency is on an overall decreasing trend. Property destruction frequency is on an overall decreasing trend. Non-compliance frequency has been stagnant at 2 occurrences for the past two consecutive sessions, but overall on a decreasing trend. Tantrum duration data are variable; data were at 89 minutes on 9/27/21, but have starkly decreased to 0 minutes for the past two consecutive sessions. Observations of Client's response to skill acquisition: Overall skill acquisition data trends are as followed: Frequency of excessive mands

1条回答

网友

1楼 · 发布于 2024-05-15 10:28:44

请注意.匹配除换行符以外的任何字符。因此，您可以使用(.|\n)捕获所有内容。而且，这条线可能会在你的固定模式内断裂。首先定义图案的前缀和后缀：

prefix=r"Observations\s+of\s+Client\s+Behavior:"
sufix=r"Observations\s+of\s+Client's\s+response\s+to\s+skill\s+acquisition:"

然后创建图案并查找所有引用：

pattern=prefix+r"((?:.|\n)*?)"+suffix
f=re.findall(pattern,text_string)

通过在r"((?:.|\n)*?)"的末尾使用*?，我们可以匹配尽可能少的字符

多行多模式示例：

text_string = '''any thing Observations of Client Behavior: patern1 Observations of Client's 
response to skill acquisition: any thing
any thing Observations of Client Behavior: patern2 Observations of 
Client's response to skill acquisition: any thing Observations of Client
Behavior: patern3 Observations of Client's response to skill acquisition: any thing any thing'''

result=re.findall(pattern,text_string)

result=[' patern1 ', ' patern2 ', ' patern3 ']

检查结果here

相关问题更多 >

编程相关推荐

热门问题

热门文章