在Python中反转正则表达式

0 投票

2 回答

5215 浏览

提问于 2025-04-18 03:54

我知道这个问题有点奇怪……我有一个正则表达式，像这样：

rex = r"at (?P<hour>[0-2][0-9]) send email to (?P<name>\w*):? (?P<message>.+)"

如果我这样匹配它：

match = re.match(rex, "at 10 send email to bob: hi bob!")

match.groupdict() 会给我这个字典：

{"hour": "10", "name": "bob", "message": "hi bob!"}

我的问题是：给定上面的字典和正则表达式，我能不能写一个函数，返回原始文本？我知道很多文本可以匹配到同一个字典（在这个例子中，名字后面的':'是可选的），但我想要的是能匹配到输入字典的某个具体文本，虽然这样的文本有无数个。

正则表达式字符串处理文本匹配模式识别字典数据结构可选匹配

2 个回答

这是一些可以与正则表达式匹配的文本：

'at {hour} send email to {name}: {message}'.format(**match.groupdict())'

回答于 2025-04-18 由 Python大师

分享举报

使用 inverse_regex：

"""
http://www.mail-archive.com/python-list@python.org/msg125198.html
"""
import itertools as IT
import sre_constants as sc
import sre_parse
import string

# Generate strings that match a given regex

category_chars = {
    sc.CATEGORY_DIGIT : string.digits,
    sc.CATEGORY_SPACE : string.whitespace,
    sc.CATEGORY_WORD  : string.digits + string.letters + '_'
    }

def unique_extend(res_list, list):
    for item in list:
        if item not in res_list:
            res_list.append(item)

def handle_any(val):
    """
    This is different from normal regexp matching. It only matches
    printable ASCII characters.
    """
    return string.printable

def handle_branch((tok, val)):
    all_opts = []
    for toks in val:
        opts = permute_toks(toks)
        unique_extend(all_opts, opts)
    return all_opts

def handle_category(val):
    return list(category_chars[val])

def handle_in(val):
    out = []
    for tok, val in val:
        out += handle_tok(tok, val)
    return out

def handle_literal(val):
    return [chr(val)]

def handle_max_repeat((min, max, val)):
    """
    Handle a repeat token such as {x,y} or ?.
    """
    subtok, subval = val[0]

    if max > 5000:
        # max is the number of cartesian join operations needed to be
        # carried out. More than 5000 consumes way to much memory.
        # raise ValueError("To many repetitions requested (%d)" % max)
        max = 5000

    optlist = handle_tok(subtok, subval)

    iterlist = []
    for x in range(min, max + 1):
        joined = IT.product(*[optlist]*x) 
        iterlist.append(joined)

    return (''.join(it) for it in IT.chain(*iterlist))

def handle_range(val):
    lo, hi = val
    return (chr(x) for x in range(lo, hi + 1))

def handle_subpattern(val):
    return list(permute_toks(val[1]))

def handle_tok(tok, val):
    """
    Returns a list of strings of possible permutations for this regexp
    token.
    """
    handlers = {
        sc.ANY        : handle_any,
        sc.BRANCH     : handle_branch,
        sc.CATEGORY   : handle_category,
        sc.LITERAL    : handle_literal,
        sc.IN         : handle_in,
        sc.MAX_REPEAT : handle_max_repeat,
        sc.RANGE      : handle_range,
        sc.SUBPATTERN : handle_subpattern}
    try:
        return handlers[tok](val)
    except KeyError, e:
        fmt = "Unsupported regular expression construct: %s"
        raise ValueError(fmt % tok)

def permute_toks(toks):
    """
    Returns a generator of strings of possible permutations for this
    regexp token list.
    """
    lists = [handle_tok(tok, val) for tok, val in toks]
    return (''.join(it) for it in IT.product(*lists))



########## PUBLIC API ####################

def ipermute(p):
    return permute_toks(sre_parse.parse(p))

你可以根据给定的 rex 和 data 进行替换，然后使用 inverse_regex.ipermute 来生成与原始正则表达式匹配的字符串：

import re
import itertools as IT
import inverse_regex as ire

rex = r"(?:at (?P<hour>[0-2][0-9])|today) send email to (?P<name>\w*):? (?P<message>.+)"
match = re.match(rex, "at 10 send email to bob: hi bob!")
data = match.groupdict()
del match

new_regex = re.sub(r'[(][?]P<([^>]+)>[^)]*[)]', lambda m: data.get(m.group(1)), rex)
for s in IT.islice(ire.ipermute(new_regex), 10):
    print(s)

结果是

today send email to bob hi bob!
today send email to bob: hi bob!
at 10 send email to bob hi bob!
at 10 send email to bob: hi bob!

注意：我对原来的 inverse_regex 做了一些修改，使其在正则表达式中包含 * 时不会抛出 ValueError 错误。相反，* 被改成了类似 {,5000} 的效果，这样你至少会得到一些排列组合。

回答于 2025-04-18 由 Python大师

分享举报

在Python中反转正则表达式

2 个回答

撰写回答