Python正则表达式不一致性

2 投票

4 回答

860 浏览

提问于 2025-04-15 17:16

我发现预编译一个正则表达式和不预编译时，得到的结果是不一样的：

>>> re.compile('mr', re.IGNORECASE).sub('', 'Mr Bean')
' Bean'
>>> re.sub('mr', '', 'Mr Bean', re.IGNORECASE)
'Mr Bean'

根据Python的文档，它提到有些函数是完整功能方法的简化版本，适用于编译后的正则表达式。不过它也说RegexObject.sub()和sub()函数是完全相同的。

那么，这到底是怎么回事呢？

正则表达式预编译文档功能方法 RegexObject sub函数不一致性

4 个回答

4

>>> help(re.sub) 1 Help on function sub in module re: 2 3 sub(pattern, repl, string, count=0) 4 Return the string obtained by replacing the leftmost 5 non-overlapping occurrences of the pattern in string by the 6 replacement repl. repl can be either a string or a callable; 7 if a callable, it's passed the match object and must return 8 a replacement string to be used.

在re.sub这个函数里，没有像re.compile那样可以直接设置正则表达式的标志（比如IGNORECASE、MULTILINE、DOTALL）的参数。

不过，有其他的解决办法：

>>> re.sub("[M|m]r", "", "Mr Bean") ' Bean' >>> re.sub("(?i)mr", "", "Mr Bean") ' Bean'

补充说明 在Python 3.1版本中，增加了对正则表达式标志的支持，详细信息可以查看这个链接：http://docs.python.org/3.1/whatsnew/3.1.html。从3.1版本开始，re.sub这个函数的用法变成了：

re.sub(pattern, repl, string[, count, flags])

回答于 2025-04-15 由 Python大师

分享举报

5

模块级别的 sub() 调用在结尾不接受修饰符。那里的“count”参数是指要替换的模式出现的最大次数。

回答于 2025-04-15 由 Python大师

分享举报

12

re.sub() 似乎不能接受 re.IGNORECASE 这个选项。

文档中说明：

sub(pattern, repl, string, count=0)

Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable; if a string, backslash escapes in it are processed. If it is a callable, it's passed the match object and must return a replacement string to be used.

不过，使用下面这个方法可以替代它：

re.sub("(?i)mr", "", "Mr Bean")

回答于 2025-04-15 由 Python大师

分享举报

撰写回答

您的回答