如何在下一个正则表达式模式匹配之前获取整个字符串?

2024-04-29 21:02:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下代码:

pat = re.compile(r'^(\d+\/\d+\/\d+,\s\d+:\d+\s\w+\s-\s)', re.S | re.M)
with open(r'C:\Users\usamahaider\Downloads\mmm.txt', encoding="utf8") as f:
    mylist = [m.group(1) for m in pat.finditer(f.read())]
print(mylist)

输出为:

['12/30/19, 8:57 AM - ', '12/3/19, 14:57 AM - ', '9/20/19, 8:52 AM - ', '12/3/19, 8:57 AM - ', '12/3/19, 9:34 PM - ', '12/3/19, 9:34 PM - ', '12/4/19, 6:45 AM - ', '12/4/19, 6:49 AM - ', '12/4/19, 7:12 AM - ', '12/4/19, 7:19 AM - ', '12/4/19, 7:20 AM - ', '12/4/19, 7:34 AM - ', '12/4/19, 8:00 AM - ', '12/4/19, 9:45 AM - ', '12/4/19, 10:15 AM - ', '12/4/19, 10:55 AM - ']

这只是返回模式,但我需要与单个模式关联的所有文本

大概是这样的:

['12/30/19, 8:57 AM -Messages and calls are end-to-end encrypted. No one outside of this chat, not even WhatsApp, can read or listen to them. Tap to learn more. ', '12/3/19, 14:57 AM - You joined using this group's invite link', '9/20/19, 8:52 AM - (347) 599-6911 created group "Sunnah Marriage Group 1"']

文本文件如下所示:

12/30/19, 8:57 AM - Messages and calls are end-to-end encrypted. No one 

outside of this chat, not even WhatsApp, can read or listen to them. Tap to learn more.
12/3/19, 14:57 AM - You joined using this group's invite link
9/20/19, 8:52 AM - (347) 599-6911 created group "Sunnah Marriage Group 1"
12/3/19, 8:57 AM - You joined using this group's invite link

12/3/19, 9:34 PM - +1 (516) 343-8410: Gender: Female
Height: 5’ 8”
Age: 21
Education: 1st Yr Medical School
Profession: Future Doctor
Marital status: Never married
Ethnicity: Pakistani
Religious background: Sunni
Family: Parents, Brothers, Sister
Language: English, Urdu
Hobbies: Travel, Art, Reading

LOOKING FOR: 
Age : 24-29
Height: 5’ 10” or taller
Religion: Sunni Muslim 
Education: MD/DO
Profession: Doctor/ Medical Residency/Medical Student 
Marital Status: Never married 

Contact: Mother
WhatsApp: (647) 879-1400
12/3/19, 9:34 PM - +1 (516) 343-8410: <Media omitted>
12/4/19, 6:45 AM - (347) 599-6911 changed this group's settings to allow all participants to send messages to this group
12/4/19, 6:49 AM - (347) 599-6911: As Salamualikum warahmatullah. Please Post and forward practicing muslims and your profiles in order to remain in the group. You have 1 day to post it until settings changes again. Strictly No chatting and no picture in the group. Please contact interested candidates in private. JazakAllahu Khairn. May Allah make halal easy for all the believers....Ameen

Tags: orandtonoinreyouread
1条回答
网友
1楼 · 发布于 2024-04-29 21:02:47

使用

re.split(r'^(?=\d+/\d+/\d+,\s\d+:\d+\s+\w+\s+-\s)', string, flags=re.M)

proof

Pythonproof:

import re
string = """12/30/19, 8:57 AM - Messages and calls are end-to-end encrypted. No one \n\noutside of this chat, not even WhatsApp, can read or listen to them. Tap to learn more.\n12/3/19, 14:57 AM - You joined using this group's invite link\n9/20/19, 8:52 AM - (347) 599-6911 created group "Sunnah Marriage Group 1"\n12/3/19, 8:57 AM - You joined using this group's invite link\n\n12/3/19, 9:34 PM - +1 (516) 343-8410: Gender: Female\nHeight: 5’ 8”\nAge: 21\nEducation: 1st Yr Medical School\nProfession: Future Doctor\nMarital status: Never married\nEthnicity: Pakistani\nReligious background: Sunni\nFamily: Parents, Brothers, Sister\nLanguage: English, Urdu\nHobbies: Travel, Art, Reading\n\nLOOKING FOR: \nAge : 24-29\nHeight: 5’ 10” or taller\nReligion: Sunni Muslim \nEducation: MD/DO\nProfession: Doctor/ Medical Residency/Medical Student \nMarital Status: Never married \n\nContact: Mother\nWhatsApp: (647) 879-1400\n12/3/19, 9:34 PM - +1 (516) 343-8410: <Media omitted>\n12/4/19, 6:45 AM - (347) 599-6911 changed this group's settings to allow all participants to send messages to this group\n12/4/19, 6:49 AM - (347) 599-6911: As Salamualikum warahmatullah. Please Post and forward practicing muslims and your profiles in order to remain in the group. You have 1 day to post it until settings changes again. Strictly No chatting and no picture in the group. Please contact interested candidates in private. JazakAllahu Khairn. May Allah make halal easy for all the believers....Ameen"""
results = list(filter(None, re.split(r'^(?=\d+/\d+/\d+,\s\d+:\d+\s+\w+\s+-\s)', string, flags=re.M)))
for line in results: print('====',line.strip())

结果

==== 12/30/19, 8:57 AM - Messages and calls are end-to-end encrypted. No one 

outside of this chat, not even WhatsApp, can read or listen to them. Tap to learn more.
==== 12/3/19, 14:57 AM - You joined using this group's invite link
==== 9/20/19, 8:52 AM - (347) 599-6911 created group "Sunnah Marriage Group 1"
==== 12/3/19, 8:57 AM - You joined using this group's invite link
==== 12/3/19, 9:34 PM - +1 (516) 343-8410: Gender: Female
Height: 5’ 8”
Age: 21
Education: 1st Yr Medical School
Profession: Future Doctor
Marital status: Never married
Ethnicity: Pakistani
Religious background: Sunni
Family: Parents, Brothers, Sister
Language: English, Urdu
Hobbies: Travel, Art, Reading

LOOKING FOR: 
Age : 24-29
Height: 5’ 10” or taller
Religion: Sunni Muslim 
Education: MD/DO
Profession: Doctor/ Medical Residency/Medical Student 
Marital Status: Never married 

Contact: Mother
WhatsApp: (647) 879-1400
==== 12/3/19, 9:34 PM - +1 (516) 343-8410: <Media omitted>
==== 12/4/19, 6:45 AM - (347) 599-6911 changed this group's settings to allow all participants to send messages to this group
==== 12/4/19, 6:49 AM - (347) 599-6911: As Salamualikum warahmatullah. Please Post and forward practicing muslims and your profiles in order to remain in the group. You have 1 day to post it until settings changes again. Strictly No chatting and no picture in the group. Please contact interested candidates in private. JazakAllahu Khairn. May Allah make halal easy for all the believers....Ameen

相关问题 更多 >