如何删除2019年7月1日这样的字符串

2024-06-12 17:49:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个大字符串,我想删除它的所有日期字符串子字符串。根据约束,日期字符串都遵循以下格式:

年月日(例如:2018年9月1日)

假设我的字符串是:

bad_s = "It was a fine day. September 1, 2018 and I had a lot of laughs August 2, 2017"

我想回来 good_s = "It was a fine day. and I had a lot of laughs"

在Python中有没有一种简单的方法可以做到这一点?你知道吗

以下是我尝试的:

reg_ex = """/[\'January\'\,\ \'February\'\,\ \'March\'\,\ \'April\'\,\ \'May\'\,\ \'June\'\,\ \'July\'\,\ \'August\'\,\ \'September\'\,\ \'October\'\,\ \'November\'\,\ \'December\'](?:\^\(\[1\-9\]\|\[12\]\\d\|3\[0\-q\]\)\$)/"""
replaced = re.sub(reg_ex, bad_s, "")

然而,这并不能取代我想要的。最后我还是bad_s。你知道吗

编辑:如果这对任何人来说都更容易,这里有一个12个月的清单,这样你就不用写了: months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']


Tags: andof字符串itregexlotbad
2条回答

也许你可以试试这个:

进口re

bad_s = 'It was a fine day. September 20, 2018 and I had a lot of laughs August 2, 2017'
regex = '([^\s]+ ([1-9]|[12]\d|3[01])\, ([12]\d{3}))'

for x in re.findall(regex, bad_s):
    bad_s = bad_s.replace(x[0], '')

print(bad_s)

结果:

"It was a fine day.  and I had a lot of laughs"

像这样?你知道吗

(january|february|march|april|may|june|july|august|september|octorber|november|december) ([1-9]|[1-2]\d|3[01]), \d{4}

只是不要忘记/i标志或任何Python等价物。你知道吗

请注意,这并不关心一个月有多少天,因此February 31, 2017将匹配,也不关心闰年。这个正则表达式是匹配程序而不是验证器。你知道吗

如果您希望它更通用,并且完全忽略日期验证,那么这将起作用:

(january|february|march|april|may|june|july|august|september|octorber|november|december) \d+, \d+

https://regex101.com/r/zVbb0v/5

相关问题 更多 >