正则表达式模式,在单个状态中同时处理大小写敏感和不区分大小写

2024-05-16 19:35:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我要处理一个小正则表达式。我有两个不同的术语。在

  1. “美国”,我想无视这一点
  2. “我们”,我想匹配而不忽略大小写。在

我想在一个正则表达式替换语句中执行以下两个正则表达式替换。在

clntxt = re.sub('(?i)United States', 'USA', "united states")
# Output: USA
clntxt = re.sub('US', 'USA', "US and us")
# output: USA and us

我需要一些像

^{pr2}$

如何实现上述目标?在


Tags: andreoutput语句united术语usstates
1条回答
网友
1楼 · 发布于 2024-05-16 19:35:34

在传统的Python版本中,(?i)整个表达式启用“忽略大小写”标志。官方文件:

(?aiLmsux)

(One or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x'.) The group matches the empty string; the letters set the corresponding flags: re.A (ASCII-only matching), re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), and re.X (verbose), for the entire regular expression. (The flags are described in Module Contents.) This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to the re.compile() function. Flags should be used first in the expression string.

但是,从Python 3.6开始,您可以在表达式的一部分内切换标志:

(?imsx-imsx:...)

(Zero or more letters from the set 'i', 'm', 's', 'x', optionally followed by '-' followed by one or more letters from the same set.) The letters set or removes the corresponding flags: re.I (ignore case), re.M (multi-line), re.S (dot matches all), and re.X (verbose), for the part of the expression. (The flags are described in Module Contents.)

New in version 3.6.

例如,(?i:foo)bar匹配foobar和{},而不是{}。所以回答你的问题:

>>> re.sub('(?i:United States)|US', 'USA', 'united states and US and us')
'USA and USA and us'

注意,这只适用于python3.6+。在

相关问题 更多 >