pyparsing、前向和递归

9 投票

2 回答

3889 浏览

数据工程师

提问于 2025-04-16 06:49

我正在使用pyparsing来解析vcd（值变化转储）文件。简单来说，我想读取这些文件，把它们解析成一个内部的字典，然后对里面的值进行操作。

不深入讨论文件的具体结构，我的问题在于如何识别嵌套的类别。

在vcd文件中，有“作用域”（scopes），里面包含了电线（wires）和可能更深层次的（嵌套的）作用域。可以把它们想象成不同的层级。

在我的文件中，我有：

$scope module toplevel $end
$scope module midlevel $end
$var wire a $end
$var wire b $end
$upscope $end
$var wire c $end
$var wire d $end
$var wire e $end
$scope module extralevel $end
$var wire f $end
$var wire g $end
$upscope $end
$var wire h $end
$var wire i $end
$upscope $end

所以“顶层”（toplevel）包含了所有内容（a - i），“中层”（midlevel）有（a - b），而“额外层”（extralevel）有（f - g）等等。

这是我用来解析这一部分的代码片段：

scope_header = Group(Literal('$scope') + Word(alphas) + Word(alphas) + \
                     Literal('$end'))

wire_map = Group(Literal('$var') + Literal('wire') + Word(alphas) + \
                 Literal('$end'))

scope_footer = Group(Literal('$upscope') + Literal('$end'))

scope = Forward()
scope << (scope_header + ZeroOrMore(wire_map) + ZeroOrMore(scope) + \
          ZeroOrMore(wire_map) + scope_footer)

现在，我原本以为，当程序遇到每个作用域时，它会记录下每个“层级”，最终我会得到一个包含嵌套作用域的结构。然而，它在

$scope module extralevel $end

出错，提示它期待'$upscope'。

所以我知道我没有正确使用递归。有人能帮我一下吗？如果需要更多信息，请告诉我。

谢谢！！！！

作用域嵌套结构层级结构递归解析 pyparsing库数据字典 vcd解析值变化转储

2 个回答

请把@ZackBloom的回答选为正确答案，他一开始就直觉到了这一点，甚至还不知道pyparsing的语法。

关于你的语法，我有几点评论和建议：

根据上面发布的答案，你可以使用pprint和pyparsing的asList()方法来可视化嵌套的结构：

res = scope.parseString(vcd)

from pprint import pprint
pprint(res.asList())

这样就得到了：

[[['$scope', 'module', 'toplevel', '$end'],
  [['$scope', 'module', 'midlevel', '$end'],
   ['$var', 'wire', 'a', '$end'],
   ['$var', 'wire', 'b', '$end'],
   ['$upscope', '$end']],
  ['$var', 'wire', 'c', '$end'],
  ['$var', 'wire', 'd', '$end'],
  ['$var', 'wire', 'e', '$end'],
  [['$scope', 'module', 'extralevel', '$end'],
   ['$var', 'wire', 'f', '$end'],
   ['$var', 'wire', 'g', '$end'],
   ['$upscope', '$end']],
  ['$var', 'wire', 'h', '$end'],
  ['$var', 'wire', 'i', '$end'],
  ['$upscope', '$end']]]

现在你有了结构清晰的结果。不过你可以稍微整理一下。首先，现在你有了结构，其实不需要那些$scope、$end等标记。你当然可以在浏览解析结果时跳过它们，但你也可以让pyparsing直接把它们从解析输出中去掉（因为结果现在已经有结构了，你其实并没有损失什么）。你可以把解析器的定义改成：

SCOPE, VAR, UPSCOPE, END = map(Suppress, 
                                 "$scope $var $upscope $end".split())
MODULE, WIRE = map(Literal, "module wire".split())

scope_header = Group(SCOPE + MODULE + Word(alphas) + END)
wire_map = Group(VAR + WIRE + Word(alphas) + END) 
scope_footer = (UPSCOPE + END)

（不需要对scope_footer进行分组——因为那个表达式里的所有内容都是被抑制的，所以Group只会给你一个空列表。）

现在你可以更清楚地看到真正重要的部分：

[[['module', 'toplevel'],
  [['module', 'midlevel'], ['wire', 'a'], ['wire', 'b']],
  ['wire', 'c'],
  ['wire', 'd'],
  ['wire', 'e'],
  [['module', 'extralevel'], ['wire', 'f'], ['wire', 'g']],
  ['wire', 'h'],
  ['wire', 'i']]]

虽然可能会有太多分组，但我建议你也对scope表达式的内容进行Group，像这样：

scope << Group(scope_header + 
               Group(ZeroOrMore((wire_map | scope))) + 
               scope_footer)

这样就得到了这些结果：

[[['module', 'toplevel'],
  [[['module', 'midlevel'], [['wire', 'a'], ['wire', 'b']]],
   ['wire', 'c'],
   ['wire', 'd'],
   ['wire', 'e'],
   [['module', 'extralevel'], [['wire', 'f'], ['wire', 'g']]],
   ['wire', 'h'],
   ['wire', 'i']]]]

现在每个scope结果都有两个可预测的元素：模块头和一个电线或子范围的列表。这种可预测性会让你写出递归代码来遍历结果变得容易得多：

res = scope.parseString(vcd)
def dumpScope(parsedTokens, indent=''):
    module,contents = parsedTokens
    print indent + '- ' + module[1]
    for item in contents:
        if item[0]=='wire':
            print indent + '  wire: ' + item[1]
        else:
            dumpScope(item, indent+'  ')
dumpScope(res[0])

最终看起来像这样：

- toplevel
  - midlevel
    wire: a
    wire: b
  wire: c
  wire: d
  wire: e
  - extralevel
    wire: f
    wire: g
  wire: h
  wire: i

这是个很好的初学者问题，欢迎来到SO和pyparsing！

回答于 2025-04-16 由 Python大师

分享举报

根据你的定义，一个作用域里面不能再包含另一个作用域，后面还跟着一些映射，然后再跟一个作用域。

如果解析器有调试模式，可以打印出它的解析树，你就能立刻看到这个情况。不过简单来说，你的意思是可以有零个或多个映射，接着是零个或多个作用域，再接着是零个或多个映射。所以如果有一个作用域后面跟着一个映射，那就说明你已经越过了作用域的部分，之后再出现的作用域就是不合法的。如果pyparsing使用的语言支持“或”的话，你可以这样写：

scope << (scope_header + ZeroOrMore((wire_map | scope)) + scope_footer)

回答于 2025-04-16 由 Python大师

分享举报

pyparsing、前向和递归

2 个回答

撰写回答