Python正则表达式的单词边界未按预期工作

7 投票

2 回答

7652 浏览

提问于 2025-04-18 13:26

为什么单词边界不起作用呢？

根据我在这个网站上看到的，单词边界的工作原理是这样的：

单词边界有三种不同的位置：

下面的a字符串似乎符合上面列出的至少一个位置。

a = 'Builders Club The Ohio State'
re.sub('\bThe\b', '', a, flags=re.IGNORECASE)

输出。'The'没有变化。

'Builders Club The Ohio State'

为什么单词边界不起作用呢？

当我在' The '模式前后加上空格时，正则表达式似乎就能正常工作了。

a = 'Builders Club The Ohio State'
re.sub(' The ', ' ', a, flags=re.IGNORECASE)

输出：

'Builders Club Ohio State'

正则表达式字符串处理编程技巧模式匹配空格处理单词边界文本分析字符分类

2 个回答

试试这个

import re
p = re.compile(ur'\bThe\b', re.IGNORECASE)
test_str = u"Builders Club The Ohio State"
subst = u""

result = re.sub(p, subst, test_str)

输出结果：

Builders Club Ohio State

这里有一个示例

回答于 2025-04-18 由 Python大师

分享举报

你需要使用一种叫做原始字符串的格式来写你的正则表达式（这样它就不会处理那些转义字符了）：

>>> import re
>>> a = 'Builders Club The Ohio State'
>>> re.sub(r'\bThe\b', '', a, flags=re.IGNORECASE)
'Builders Club  Ohio State'
>>>

否则，\b会被当作一个退格符来理解：

>>> print('x\by')
y
>>> print(r'x\by')
x\by
>>>

回答于 2025-04-18 由 Python大师

分享举报