提取正则匹配组的名称和跨度

0 投票

2 回答

1306 浏览

提问于 2025-04-28 13:10

我有一个正则表达式，长这样：

rgx = '(?P<foo>ABC)(?P<bar>DEF)?(?P<norf>HIJK)'

获取匹配到的字符串没问题，使用 m.group(name) 就可以了。不过，我需要提取匹配组的名称和范围（或者仅仅通过名称提取范围），但我还没找到办法做到这一点。我想做的事情类似于：

p = re.compile(p, re.IGNORECASE)
m = p.match(targetstring)
#then do something to set 'all' to the list of match objects
for mo in all
   print mo.name() + '->' + mo.span()

举个例子，输入字符串 'ABCDEFHIJK' 应该生成以下输出：

'foo'  -> (0, 3)
'bar'  -> (3, 6)
'norf' -> (6, 10)

谢谢！

暂无标签

2 个回答

你可以使用 RegexObject.groupindex：

p = re.compile(rgx, re.IGNORECASE)
m = p.match('ABCDEFHIJK')

for name, n in sorted(m.re.groupindex.items(), key=lambda x: x[1]):
    print name, m.group(n), m.span(n)

回答于 2025-04-28 由 Python大师

分享举报

你可以遍历匹配到的组的名字（也就是groupdict里的键），然后打印出对应的span属性：

rgx = '(?P<foo>ABC)(?P<bar>DEF)?(?P<norf>HIJK)'
p = re.compile(rgx, re.IGNORECASE)
m = re.match(p, 'ABCDEFHIJKLM')

for key in m.groupdict():
    print key, m.span(key)

这样会输出：

foo (0, 3)
bar (3, 6)
norf (6, 10)

补充说明: 由于字典的键是没有顺序的，你可能想要明确选择遍历这些键的顺序。在下面的例子中，sorted(...)是一个按照对应字典值（也就是span元组）排序的组名列表：

for key in sorted(m.groupdict().keys(), key=m.groupdict().get):
    print key, m.span(key)

回答于 2025-04-28 由 Python大师

分享举报

提取正则匹配组的名称和跨度

2 个回答

撰写回答