基于匹配关键字在Python中重建正则表达式字符串

5 投票

4 回答

2248 浏览

提问于 2025-04-16 11:24

下面是一个正则表达式的例子

regex = re.compile('^page/(?P<slug>[-\w]+)/(?P<page_id>[0-9]+)/$')
matches = regex.match('page/slug-name/5/')
>> matches.groupdict()
{'slug': 'slug-name', 'page_id': '5'}

有没有简单的方法可以把一个字典传回正则表达式，以便重建一个字符串呢？

比如说，{'slug': 'new-slug', 'page_id': '6'} 这个字典会变成 page/new-slug/6/ 这样的字符串。

正则表达式字典关键字匹配字符串重建

4 个回答

这里有一个解决方案，不需要新的正则表达式：

import re
import operator

regex = re.compile('^page/(?P<slug>[-\w]+)/(?P<page_id>[0-9]+)/$')
matches = regex.match('page/slug-name/5/')
groupdict = {'slug': 'new-slug', 'page_id': '6'}
prev_index = matches.start(0)
new_string = ""
for group, index in sorted(regex.groupindex.iteritems(), key=operator.itemgetter(1)):
    new_string += matches.string[prev_index:matches.start(index)] + groupdict[group]
    prev_index = matches.end(index)

new_string += matches.string[prev_index:matches.end(0)]
print new_string
# 'page/new-slug/6/'

这个方法是通过用groupdict中提供的值来替换命名组，字符串的其余部分则通过对输入字符串（matches.string）进行切片来插入。new_string将是原始字符串中与正则表达式匹配的部分，并进行了相关的替换。如果想让new_string包含字符串中未匹配的部分，可以把prev_index = matches.start(0)替换为prev_index = 0，并在for循环结束后的最终切片中去掉matches.end(0)。

回答于 2025-04-16 由 Python大师

分享举报

正则表达式的方法是用来处理字符串的。既然你有一个 dict（字典），我觉得用字符串的 format 方法会更合适：

In [16]: d={'slug': 'new-slug', 'page_id': '6'}

In [17]: 'page/{slug}/{page_id}'.format(**d)
Out[17]: 'page/new-slug/6'

有很多更复杂的正则表达式是不能用下面的方法来处理的，但如果你总是使用不嵌套的命名匹配组 (?P<name>...)，并且把 pat 限制在不包含比 \A、^、\Z、$ 或 \b 更复杂的内容，那么你也许可以这样做：

import re
import string


pat=r'\Apage/(?P<slug>[-\w]+)/(?P<page_id>[0-9]+)/\Z'
regex = re.compile(pat)
matches = regex.match('page/slug-name/5/')
print(matches.groupdict())
# {'page_id': '5', 'slug': 'slug-name'}

# Convert '(?P<slug>...)' to '{slug}'    
reverse_pat=re.sub(r'\(\?P<(.*?)>.*?\)',r'{\1}',pat)
# Strip off the beginning ^ and ending $
reverse_pat=re.sub(r'^(?:\\A|\^)(.*)(?:\\Z|\$)$',r'\1',reverse_pat)
# drop any `\b`s.
reverse_pat=re.sub(r'\\b',r'',reverse_pat)
# there are many more such rules one could conceivably need... 
print(reverse_pat.format(**matches.groupdict()))
# page/slug-name/5/

回答于 2025-04-16 由 Python大师

分享举报

这里有一个使用sre_parse的解决方案

import re
from sre_parse import parse

pattern = r'^page/(?P<slug>[-\w]+)/(?P<page_id>[0-9]+)/$'
regex = re.compile(pattern)
matches = regex.match('page/slug-name/5/')
params = matches.groupdict()
print params
>> {'page_id': '5', 'slug': 'slug-name'}

lookup = dict((v,k) for k, v in regex.groupindex.iteritems())
frags = [chr(i[1]) if i[0] == 'literal' else str(params[lookup[i[1][0]]]) \
    for i in parse(pattern) if i[0] != 'at']
print ''.join(frags)
>> page/slug-name/5/

这个方法的工作原理是通过parse()获取原始的操作码，提取出位置操作码（它们的第一个参数是' at '），替换掉命名的组，然后在完成时把这些片段连接在一起。

回答于 2025-04-16 由 Python大师

分享举报

基于匹配关键字在Python中重建正则表达式字符串

4 个回答

撰写回答