Python - 多选标记解析

1 投票
4 回答
1077 浏览
提问于 2025-04-17 09:40

考虑一下这段文字:

你想通过电子邮件收到你问题的回答吗?

我打算用一种方式给几个词提供多个选择,像这样标记:

你想[得到]|[拥有]|g[有]你问题的回答通过[到]|g[你]|[上]电子邮件发送给你吗?

这些选择用方括号括起来,并用竖线分隔。
正确的选择前面会有一个g

我想解析这个句子,让它的格式变成这样:

你想要 __ 你问题的回答通过 __ 电子邮件发送给你吗?

并且有一个这样的列表:

[
  [
    {"to get":0},
    {"having":0},
    {"to have":1},
  ],
  [
    {"up to":0},
    {"to":1},
    {"on":0},
  ],
]

我的标记设计可以吗?
怎么用正则表达式处理这个句子,以得到需要的结果并生成列表?

编辑:需要一种面向用户的标记语言。

4 个回答

2

这里是一个使用正则表达式的简单解析实现:

import re
s = "Would you like [to get]|[having]|g[to have] responses to your questions sent [up to]|g[to]|[on] you via email ?"   # pattern string

choice_groups = re.compile(r"((?:g?\[[^\]]+\]\|?)+)")  # regex to get choice groups
choices = re.compile(r"(g?)\[([^\]]+)\]")  # regex to extract choices within each group

# now, use the regexes to parse the string:
groups = choice_groups.findall(s)
# returns: ['[to get]|[having]|g[to have]', '[up to]|g[to]|[on]']

# parse each group to extract possible choices, along with if they are good
group_choices = [choices.findall(group) for group in groups]
# will contain [[('', 'to get'), ('', 'having'), ('g', 'to have')], [('', 'up to'), ('g', 'to'), ('', 'on')]]

# finally, substitute each choice group to form a template
template = choice_groups.sub('___', s)
# template is "Would you like ___ responses to your questions sent ___ you via email ?"

现在将这个解析成你需要的格式应该很简单。祝你好运 :)

3

我会加上一些分组的括号 {},然后输出的结果不是字典的列表的列表,而是直接一个字典的列表。

代码:

import re

s = 'Would you like {[to get]|[having]|g[to have]} responses to your questions sent {[up to]|g[to]|[on]} you via email ?'

def variants_to_dict(variants):
    dct = {}
    for is_good, s in variants:
        dct[s] = 1 if is_good == 'g' else 0
    return dct

def question_to_choices(s):
    choices_re = re.compile(r'{[^}]+}')
    variants_re = re.compile(r'''\|?(g?)
                                 \[
                                    ([^\]]+)
                                 \]
                                ''', re.VERBOSE)
    choices_list = []
    for choices in choices_re.findall(s):
        choices_list.append(variants_to_dict(variants_re.findall(choices)))

    return choices_re.sub('___', s), choices_list

question, choices = question_to_choices(s)
print question
print choices

输出:

Would you like ___ responses to your questions sent ___ you via email ?
[{'to have': 1, 'to get': 0, 'having': 0}, {'to': 1, 'up to': 0, 'on': 0}]
2

我也来分享一下我的解决方案:

你想要把对你问题的回复通过电子邮件发送给你吗?

def extract_choices(text):
    choices = []

    def callback(match):
        variants = match.group().strip('{}')
        choices.append(dict(
            (v.lstrip('+'), v.startswith('+'))
            for v in variants.split('|')
        ))
        return '___'

    text = re.sub('{.*?}', callback, text)

    return text, choices

我们来试试这个:

>>> t = 'Would you like {to get|having|+to have} responses to your questions    sent {up to|+to|on} you via email?'
>>> pprint.pprint(extract_choices(t))
... ('Would you like ___ responses to your questions sent ___ you via email?',
... [{'having': False, 'to get': False, 'to have': True},
...  {'on': False, 'to': True, 'up to': False}])

撰写回答