Python正则表达式来查找包含精确单词的短语

2024-06-09 09:03:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个字符串列表,希望找到确切的阶段。你知道吗

到目前为止,我的代码只找到月份和年份,但是需要整个阶段,包括“-Recorded”,比如“2016年3月-Recorded”。你知道吗

如何将“-Recorded”添加到regex中?你知道吗

import re


texts = [

"Shawn Dookhit took annual leave in March 2016 - Recorded The report",
"Soondren Armon took medical leave in February 2017 - Recorded It was in",
"David Padachi took annual leave in May 2016 - Recorded It says",
"Jack Jagoo",
"Devendradutt Ramgolam took medical leave in August 2016 - Recorded Day back",
"Kate Dudhee",
"Vinaye Ramjuttun took annual leave in  - Recorded Answering"

]

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s')     

for t in texts:
    try:
        m = regex.search(t)
        print m.group()
    except:
        print "keyword's not found"

Tags: 字符串代码inre列表it阶段regex
2条回答

将列表理解与更正的正则表达式一起使用:

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s* - Recorded')

matches = [match.groups() for text in texts for match in [regex.search(text)] if match]
print(matches)
# [('March', '2016'), ('February', '2017'), ('May', '2016'), ('August', '2016')]

这里有两个命名的组:monthyear,它们从字符串中提取月份和年份。要将- Recorded放入recorded命名组,可以执行以下操作:

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s(?P<recorded>- Recorded)')

或者,如果您只需将- Recorded添加到您的正则表达式中,而不需要指定组:

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s- Recorded')

或者您可以添加带有连字符和一个大写单词的命名组other

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s(?P<other>- [A-Z][a-z]+)') 

我认为第一个或第三个选择更可取,因为你已经得到了命名组。我还建议您使用这个网站http://pythex.org/,它确实有助于构造regex:)。你知道吗

相关问题 更多 >