用于日期时间挖掘的Python正则表达式

2024-06-09 20:41:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试使用re将文本挖掘到列表中

以下是我写的:

dateStr =  "20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009"
regex = r'(?:\d{1,2}[/-]*)?(?:Mar)?[a-z\s,.]*(?:\d{1,2}[/-]*)+(?:\d{2,4})+'
result = re.findall(regex, dateStr)

即使我在表达式的开头声明了(?:\d{1,2}[/-]*),我也缺少天数数字。这是我得到的 :

['Mar 2009', 'March 2009', 'Mar. 2009', 'March, 2009']

你能帮忙吗? 谢谢

编辑:
通过评论解决了这个问题。

原始赋值字符串: dateStr = "04-20-2009; 04/20/09; 4/20/09; 4/3/09; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009; 20 Mar 2009; 20 March 2009; 2 Mar. 2009; 20 March, 2009; Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009; Feb 2009; Sep 2009; Oct 2010; 6/2008; 12/2009; 2009; 2010"


Tags: 文本re声明编辑列表表达式评论数字
2条回答

众多方法之一:

import re
dateStr =  "20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009"
regex = r'[0-9]{1,2}\s[a-zA-Z]+[.,]*\s[0-9]{4}'
result = re.findall(regex, dateStr)
print (result)

输出:

['20 Mar 2009', '20 March 2009', '20 Mar. 2009', '20 March, 2009']

我将使用:

dateStr =  "20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009"
dt = re.findall(r'\d{1,2} \w+[,.]? \d{4}', dateStr)
print(dt)  # ['20 Mar 2009', '20 March 2009', '20 Mar. 2009', '20 March, 2009']

上面使用的“一刀切”正则表达式模式表示要匹配:

\d{1,2}  a one or two digit day
[ ]      space
\w+      month name or abbreviation
[,.]?    possibly followed by comma or period
[ ]      space
\d{4}    four digit year

相关问题 更多 >