如何使用正则表达式从字符串中提取子字符串

2024-05-13 23:03:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我下面有一根这样的线 我想使用正则表达式或任何其他可能的方法从这个字符串中提取突出显示的部分

密尔沃基/沙利文的国家气象局发布了一份截至美国东部时间晚上945点的天气预报。美国东部时间晚上9点11分,在威斯康星州德尔斯以东8英里处,一场能够产生龙卷风的强雷暴,以45\n英里/小时的速度向东北移动。\n\n龙卷风。\n\n来源…雷达指示旋转。\n\n撞击…飞出的碎片对那些没有安全带的人来说是危险的。移动房屋将被损坏或毁坏。\n可能会损坏屋顶、窗户和车辆。树\n可能出现图像。\n\n*受影响的位置包括…\n巴克沃基、奋进和布里格斯维尔

description = 'The National Weather Service in Milwaukee/Sullivan has issued a\n\n* Tornado Warning for...\nNorthwestern Columbia County in south central Wisconsin...\nSouthwestern Marquette County in south central Wisconsin...\n\n* Until 945 PM CDT.\n\n* At 911 PM CDT, a severe thunderstorm capable of producing a tornado\nwas located 8 miles east of Wisconsin Dells, moving northeast at 45\nmph.\n\nHAZARD...Tornado.\n\nSOURCE...Radar indicated rotation.\n\nIMPACT...Flying debris will be dangerous to those caught without\nshelter. Mobile homes will be damaged or destroyed.\nDamage to roofs, windows, and vehicles will occur.  Tree\ndamage is likely.\n\n* Locations impacted include...\nPackwaukee, Endeavor and Briggsville.'

#now I want to match substring between (Tornado Warning for... *** ...\n\n*)

# I tried to like this

re.search('Tornado Warning for...(.*)\n\n*', description)

# I am getting results like this

<re.Match object; span=(67, 90), match='Tornado Warning for...\n'>

#expected result 

<re.Match object; span=(any, any), match='Tornado Warning for...\nNorthwestern Columbia County in south central Wisconsin...\nSouthwestern Marquette County in south central Wisconsin...\n\n*'>

它不匹配其唯一匹配项Tornado Warning for...\n的完整子字符串

我想比赛 Tornado Warning for...\nNorthwestern Columbia County in south central Wisconsin...\nSouthwestern Marquette County in south central Wisconsin...\n\n*

其中子字符串开始Tornado Warning for...和结束\n\n*

谢谢你的帮助,很抱歉我的英语不好


Tags: to字符串inforwilltornadocentralsouth
3条回答

.无法匹配\n。用[\W\w]代替.

import re
description = 'The National Weather Service in Milwaukee/Sullivan has issued a\n\n* Tornado Warning for...\nNorthwestern Columbia County in south central Wisconsin...\nSouthwestern Marquette County in south central Wisconsin...\n\n* Until 945 PM CDT.\n\n* At 911 PM CDT, a severe thunderstorm capable of producing a tornado\nwas located 8 miles east of Wisconsin Dells, moving northeast at 45\nmph.\n\nHAZARD...Tornado.\n\nSOURCE...Radar indicated rotation.\n\nIMPACT...Flying debris will be dangerous to those caught without\nshelter. Mobile homes will be damaged or destroyed.\nDamage to roofs, windows, and vehicles will occur.  Tree\ndamage is likely.\n\n* Locations impacted include...\nPackwaukee, Endeavor and Briggsville.'

print(re.search(r'Tornado Warning for\.\.\.([\W\w]*?)\n\n\*', description).group())

"""
Tornado Warning for...
Northwestern Columbia County in south central Wisconsin...
Southwestern Marquette County in south central Wisconsin...

*
"""

你可以匹配

\bTornado Warning for\.\.\.(?:\n.*)*?\n\n

模式匹配:

  • \bTornado Warning for\.\.\.匹配Tornado Warning for前面有一个单词边界,并将点转义以逐字匹配
  • (?:\n.*)*?尽可能将换行符与该行的其余部分进行多次匹配
  • \n\n匹配2个换行符

Regex demoPython demo

比如说

import re

description = 'The National Weather Service in Milwaukee/Sullivan has issued a\n\n* Tornado Warning for...\nNorthwestern Columbia County in south central Wisconsin...\nSouthwestern Marquette County in south central Wisconsin...\n\n* Until 945 PM CDT.\n\n* At 911 PM CDT, a severe thunderstorm capable of producing a tornado\nwas located 8 miles east of Wisconsin Dells, moving northeast at 45\nmph.\n\nHAZARD...Tornado.\n\nSOURCE...Radar indicated rotation.\n\nIMPACT...Flying debris will be dangerous to those caught without\nshelter. Mobile homes will be damaged or destroyed.\nDamage to roofs, windows, and vehicles will occur.  Tree\ndamage is likely.\n\n* Locations impacted include...\nPackwaukee, Endeavor and Briggsville.'

m = re.search(r'\bTornado Warning for\.\.\.(?:\n.*)*?\n\n', description)
if m:
    print(m.group())

输出

Tornado Warning for...
Northwestern Columbia County in south central Wisconsin...
Southwestern Marquette County in south central Wisconsin...

正则表达式可以如下所示:

matched_string = re.findall("Tornado[a-zA-Z\s\.\\\*]+\\n\\n\*", description)
print(matched_string)

相关问题 更多 >