正则表达式,如何移除全部非字母数字字符,除了时间戳中的冒号? (12/24小时制)

2024-05-19 22:11:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一根绳子像:

Today, 3:30pm - Group Meeting to discuss "big idea"

如何构造正则表达式,以便在解析后返回:

Today 3:30pm Group Meeting to discuss big idea

我希望它删除所有非字母数字字符,除了出现在12或24小时时间戳中的字符。


Tags: totoday字母时间group数字字符discuss
3条回答

Python。

import string
punct=string.punctuation
s='Today, 3:30pm - Group Meeting:am to discuss "big idea" by our madam'
for item in s.split():
    try:
        t=time.strptime(item,"%H:%M%p")
    except:
        item=''.join([ i for i in item if i not in punct])
    else:
        item=item
    print item,

输出

$ ./python.py
Today 3:30pm  Group Meetingam to discuss big idea by our madam

# change to s='Today, 15:30pm - Group 1,2,3 Meeting to di4sc::uss3: 2:3:4 "big idea" on 03:33pm or 16:47 is also good'

$ ./python.py
Today 15:30pm  Group 123 Meeting to di4scuss3 234 big idea on 03:33pm or 1647 is also good

注:方法应该改进,只在必要时检查有效时间(通过附加条件),但我暂时不做检查。

我假设您也希望保留空格,这个实现是在python中实现的,但是它是PCRE,所以应该是可移植的。

import re
x = u'Today, 3:30pm - Group Meeting to discuss "big idea"'
re.sub(r'[^a-zA-Z0-9: ]', '', x)

输出:“今天下午3:30小组会议讨论大创意”

回答得稍微清楚一点(没有两个空格)

import re
x = u'Today, 3:30pm - Group Meeting to discuss "big idea"'
tmp = re.sub(r'[^a-zA-Z0-9: ]', '', x)
re.sub(r'[ ]+', ' ', tmp)

输出:“今天下午3:30小组会议讨论大创意”

# this: D:DD, DD:DDam/pm 12/24 hr
re = r':(?=..(?<!\d:\d\d))|[^a-zA-Z0-9 ](?<!:)'

冒号前面必须至少有一位数字,后面必须至少有两位数字:那就是时间。所有其他冒号将被视为文本冒号。

工作原理

:              // match a colon
(?=..          // match but not capture two chars
  (?<!         // start a negative look-behind group (if it matches, the whole fails)
    \d:\d\d    // time stamp
  )            // end neg. look behind
)              // end non-capture two chars
|              // or
[^a-zA-Z0-9 ]  // match anything not digits or letters
(?<!:)         // that isn't a colon

当应用到这个愚蠢的文本时:

Today, 3:30pm - Group 1,2,3 Meeting to di4sc::uss3: 2:3:4 "big idea" on 03:33pm or 16:47 is also good

…将其更改为:

Today, 3:30pm  Group 123 Meeting to di4scuss3 234 big idea on 03:33pm or 16:47 is also good

相关问题 更多 >