查找文本中的所有实例，最后一个单词也应该是使用regex for python进行搜索的开始 - 问答 - Python中文网

查找文本中的所有实例，最后一个单词也应该是使用regex for python进行搜索的开始

2024-05-16 00:04:14 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我无法找到正则表达式问题的解决方案。这实际上是一个关于这个帖子的后续问题： Find string between two substrings AND between string and the end of file

我创建了以下示例文本（在我的应用程序中，文本要长得多，并且有多个文件等）：

Course 22/09/2010 1. Early duty Josephine, Jansen 22-09-2010 10:37:08 Date 22/09/2010 Duty 1. Early duty 1.3 Here there can be some other related stuff Nursegoals Interventions Record This is now the fourth note. 6.2.1.3 Confusion: Observing. Nursegoals Interventions Record This is a new, note (again), i call it note 3. Course 22/09/2010 1. Early duty Record This is again a note, i call it note 2. Apple: 0/less Course 22/09/2010 3. Nightduty Josephine, Jansen 22-09-2010 06:22:25 Date 22/09/2010 Course 3. Nightduty 1.3 Something else here Nursegoals Interventions Record 6.2.1.3 Confusion: Observing. Nursegoals Interventions Record Course 22/09/2010 3. Nightduty Record This is a new note, i call it note 1.

现在我想解析这个文本中的特定信息。我的兴趣是“记录”，所以记录后面的文本部分。具体记录的日期，我指的是2010年11月2日，以及早班、晚班或夜班的概念（所以日期应该是：'2010年9月2日1.早班'）。我的问题是，文件中没有真正的一致性，所以有时一个日期有两个注释，有时只有一个注释。有时注释部分包含文本，有时不包含文本

我知道如何解析记录部分，但我不知道如何首先解析日期，然后解析注释部分。所以我想把问题一分为二。我的第一步是，把整个文件分成不同的日期部分。第二步：遍历所有日期部分以获取特定日期部分的注释（使用正则表达式）。然后我会制作一个包含特定日期的列表（如果我只想要特定的日期，就把它放在一个列单元格中，例如，我只需解析该日期部分的前13个字符）和与该日期相关的注释。例如：

列表=[02-08-2010 1.早班，[note1，note2]，02-08-2010 2.晚班，[note1]等]

让我们把重点放在日期解析上，这样我的问题就清楚了。我使用以下代码：

date = r'Course\s+(.*?)(?:Course|$)'
date_list = re.findall(date, text, re.DOTALL)
for i in date_list: 
   print (i)
   print ('XXX')

输出为：

22/09/2010 1. Early duty Josephine, Jansen 22-09-2010 10:37:08 Date22/09/2010 Duty 1. Early duty 1.3 Here there can be some other related stuff Nursegoals Interventions Record This is now the fourth note. 6.2.1.3 Confusion: Observing. Nursegoals Interventions Record This is a new, note (again), i call it note 3. XXX 22/09/2010 3. Nightduty Josephine, Jansen 22-09-2010 06:22:25 Date 22/09/2010 XXX 22/09/2010 3. Nightduty Record This is a new note, i call it note 1. XXX

此输出缺少以下元素：

['Course 22/09/2010 1. Early duty Record This is again a note, i call it note 2. Apple: 0/less']

以及

['3. Nightduty 1.3 Something else here Nursegoals Interventions Record 6.2.1.3 Confusion: Observing. Nursegoals Interventions']

我认为正则表达式不会把单词“Course”的结尾，而把als看作是一个新的So-to-say匹配的开始

如果有人能帮我就太好了：）可能我错过了什么

Tags：文本 is it call this record note duty

1条回答

网友

1楼 · 发布于 2024-05-16 00:04:14

将非捕获组更改为正向前瞻：

r'Course\s+(.*?)(?=Course|$)'
                 ^^

参见regex demo。一个展开的更快的变体是r'Course\s+([^C]*(?:C(?!ourse)[^C]*)*)'（参见demo）

否则，重叠的子字符串将不匹配

import re
rx = r"Course\s+(.*?)(?=Course|$)"
s = "Course 22/09/2010 1. Early duty Josephine, Jansen 22-09-2010 10:37:08 Date 22/09/2010 Duty 1. Early duty 1.3 Here there can be some other related stuff Nursegoals Interventions Record This is now the fourth note. 6.2.1.3 Confusion: Observing. Nursegoals Interventions Record This is a new, note (again), i call it note 3. Course 22/09/2010 1. Early duty Record This is again a note, i call it note 2. Apple: 0/less Course 22/09/2010 3. Nightduty Josephine, Jansen 22-09-2010 06:22:25 Date 22/09/2010 Course 3. Nightduty 1.3 Something else here Nursegoals Interventions Record 6.2.1.3 Confusion: Observing. Nursegoals Interventions Record Course 22/09/2010 3. Nightduty Record This is a new note, i call it note 1."
results = re.findall(rx, s, re.DOTALL)
for x in results:
    print(x)

输出：

22/09/2010 1. Early duty Josephine, Jansen 22-09-2010 10:37:08 Date 22/09/2010 Duty 1. Early duty 1.3 Here there can be some other related stuff Nursegoals Interventions Record This is now the fourth note. 6.2.1.3 Confusion: Observing. Nursegoals Interventions Record This is a new, note (again), i call it note 3. 
22/09/2010 1. Early duty Record This is again a note, i call it note 2. Apple: 0/less 
22/09/2010 3. Nightduty Josephine, Jansen 22-09-2010 06:22:25 Date 22/09/2010 
3. Nightduty 1.3 Something else here Nursegoals Interventions Record 6.2.1.3 Confusion: Observing. Nursegoals Interventions Record 
22/09/2010 3. Nightduty Record This is a new note, i call it note 1.

相关问题更多 >

编程相关推荐

热门问题

热门文章