pyparsing输出嵌套字典

6 投票

2 回答

2778 浏览

提问于 2025-04-18 16:48

我正在使用 pyparsing 来解析一种特定格式的表达式：

"and(or(eq(x,1), eq(x,2)), eq(y,3))"

我的测试代码看起来是这样的：

from pyparsing import Word, alphanums, Literal, Forward, Suppress, ZeroOrMore, CaselessLiteral, Group

field = Word(alphanums)
value = Word(alphanums)
eq_ = CaselessLiteral('eq') + Group(Suppress('(') + field + Literal(',').suppress() + value + Suppress(')'))
ne_ = CaselessLiteral('ne') + Group(Suppress('(') + field + Literal(',').suppress() + value + Suppress(')'))
function = ( eq_ | ne_ )

arg = Forward()
and_ = Forward()
or_ = Forward()

arg << (and_ | or_ |  function) + Suppress(",") + (and_ | or_ | function) + ZeroOrMore(Suppress(",") + (and_ | function))

and_ << Literal("and") + Suppress("(") + Group(arg) + Suppress(")")
or_ << Literal("or") + Suppress("(") + Group(arg) + Suppress(")")

exp = (and_ | or_ | function)

print(exp.parseString("and(or(eq(x,1), eq(x,2)), eq(y,3))"))

我得到的输出是：

['and', ['or', ['eq', ['x', '1'], 'eq', ['x', '2']], 'eq', ['y', '3']]]

列表输出看起来没问题。但是为了后续处理，我希望能把输出变成一个嵌套的字典格式：

{
    name: 'and',
    args: [
        {
            name: 'or',
            args: [
                {
                    name: 'eq',
                    args: ['x','1']
                },
                {
                    name: 'eq',
                    args: ['x','2']
                }
            ]
        },
        {
            name: 'eq',
            args: ['y','3']
        }
    ]
}

我尝试过使用 Dict 类，但没有成功。

在 pyparsing 中可以做到这一点吗？还是我应该手动格式化列表输出呢？

数据结构解析列表处理 pyparsing 自定义类嵌套字典表达式解析

2 个回答

我觉得 pyparsing 可能没有类似的功能，但你可以通过递归的方式来创建数据结构：

def toDict(lst):
    if not isinstance(lst[1], list):
        return lst
    return [{'name': name, 'args': toDict(args)}
            for name, args in zip(lst[::2], lst[1::2])]

你的例子在 args 子项的数量上表现得不同。如果只有一个子项，你可以直接用一个 dict，但如果有多个，就得用一个字典的列表。这会让使用起来变得复杂。即使只有一个子项，最好还是用字典的列表。这样你总是知道怎么遍历这些子项，而不需要检查它们的类型。

示例

我们可以使用 json.dumps 来美化输出（注意这里我们打印 parsedict[0]，因为我们知道根节点只有一个子项，但我们总是返回列表，正如之前所说的）：

import json
parsed = ['and', ['or', ['eq', ['x', '1'], 'eq', ['x', '2']], 'eq', ['y', '3']]]
parsedict = toDict(parsed)
print json.dumps(parsedict[0], indent=4, separators=(',', ': '))

输出

{
    "name": "and",
    "args": [
        {
            "name": "or",
            "args": [
                {
                    "name": "eq",
                    "args": [
                        "x",
                        "1"
                    ]
                },
                {
                    "name": "eq",
                    "args": [
                        "x",
                        "2"
                    ]
                }
            ]
        },
        {
            "name": "eq",
            "args": [
                "y",
                "3"
            ]
        }
    ]
}

为了得到这个输出，我在 toDict 函数中把 dict 替换成了 collections.OrderedDict，这样可以确保 name 在 args 之前。

回答于 2025-04-18 由 Python大师

分享举报

你想要的这个功能在pyparsing中非常重要，就是设置结果名称。使用结果名称是大多数pyparsing应用程序的推荐做法。这个功能从0.9版本就已经存在了。

expr.setResultsName("abc")

这让我们可以通过res["abc"]或者res.abc来访问解析结果中的特定字段（这里的res是从parser.parseString返回的值）。你也可以调用res.dump()来查看结果的嵌套结构。

不过，为了让解析器更容易理解，我在1.4.6版本中添加了对这种设置结果名称形式的支持：

expr("abc")

这是你的解析器，稍微整理了一下，并添加了结果名称：

COMMA,LPAR,RPAR = map(Suppress,",()")
field = Word(alphanums)
value = Word(alphanums)
eq_ = CaselessLiteral('eq')("name") + Group(LPAR + field + COMMA + value + RPAR)("args")
ne_ = CaselessLiteral('ne')("name") + Group(LPAR + field + COMMA + value + RPAR)("args")
function = ( eq_ | ne_ )

arg = Forward()
and_ = Forward()
or_ = Forward()
exp = Group(and_ | or_ | function)

arg << delimitedList(exp)

and_ << Literal("and")("name") + LPAR + Group(arg)("args") + RPAR
or_ << Literal("or")("name") + LPAR + Group(arg)("args") + RPAR

不幸的是，dump()只处理结果的嵌套，而不处理值的列表，所以它没有json.dumps那么好用（也许这可以作为对dump()的一个改进？）。所以这里有一个自定义的方法来输出你的嵌套名称参数结果：

ob = exp.parseString("and(or(eq(x,1), eq(x,2)), eq(y,3))")[0]

INDENT_SPACES = '    '
def dumpExpr(ob, level=0):
    indent = level * INDENT_SPACES
    print (indent + '{')
    print ("%s%s: %r," % (indent+INDENT_SPACES, 'name', ob['name']))
    if ob.name in ('eq','ne'):
        print ("%s%s: %s"   % (indent+INDENT_SPACES, 'args', ob.args.asList()))
    else:
        print ("%s%s: ["   % (indent+INDENT_SPACES, 'args'))
        for arg in ob.args:
            dumpExpr(arg, level+2)
        print ("%s]"   % (indent+INDENT_SPACES))
    print (indent + '}' + (',' if level > 0 else ''))
dumpExpr(ob)

结果是：

{
    name: 'and',
    args: [
        {
            name: 'or',
            args: [
                {
                    name: 'eq',
                    args: ['x', '1']
                },
                {
                    name: 'eq',
                    args: ['x', '2']
                },
            ]
        },
        {
            name: 'eq',
            args: ['y', '3']
        },
    ]
}

回答于 2025-04-18 由 Python大师

分享举报

pyparsing输出嵌套字典

2 个回答

示例

撰写回答