Python PLY 解析器

2 投票
2 回答
5060 浏览
提问于 2025-04-17 05:04

我试着在网上找这个问题的答案,但似乎找不到。我想用Python和PLY写一个解析器,解析一种我自己编造的语言。我的BNF(巴科斯-诺尔范式)简化版大概是这样的:

statement-list -> statement ',' statement-list |
                 'print' expr

statement -> ident 'was' 'a' type |
             ident 'became' expr

type -> 'number' | 'letter'

expr -> factor |
       expr '+' factor |
       expr '-' factor

factor -> number | letter | ident

这里的number和letter就像int和char。

Yacc的文档(http://www.dabeaz.com/ply/ply.html#ply_nn23)只展示了简单算术表达式的语法,里面清楚地说明了p[0]应该是什么。

def p_expression_plus(p):
   'expression : expression PLUS term'
    p[0] = p[1] + p[3]

我想问的是,在我的BNF中,statement-list该怎么写?我有:

def p_statement_list_comma(p):
    'statement-list : statement COMMA statement-list'

但我真的不确定接下来该写什么。希望能得到一些帮助!

2 个回答

7

我不能代表PLY的解决方案,但这里有一个使用pyparsing的例子。有时候,即使你最终想用其他库来实现解析器,pyparsing的例子也能帮你快速搭建一个简单的原型或练习。不幸的是,这个例子大量使用了operatorPrecedence方法,这让很多中缀解析的细节变得不太明显,所以我不确定你能多容易地把它转化成其他形式。你可以在pyparsing的wiki上找到一个更传统的表达式/项/因子的解析器例子,地址在示例页面上,标题是fourFn.py

bnf = """
statement-list -> statement ',' statement-list

statement -> ident 'was' 'a' type | 
             ident 'became' expr |
             'print' expr |
             'if' conditional-expr statement

type -> 'number' | 'letter' 

expr -> factor | 
       expr '+' factor | 
       expr '-' factor 

factor -> number | letter | ident 
"""

from pyparsing import (CaselessKeyword, Word, nums, alphas, alphanums, operatorPrecedence, 
    Forward, MatchFirst, opAssoc, oneOf, Group, delimitedList)

PRINT, WAS, A, BECAME, NUMBER, LETTER, IF, ELSE, TRUE, FALSE, AND, OR, NOT = map(
    CaselessKeyword,
    "print was a became number letter if else true false and or not".upper().split())
keyword = MatchFirst([PRINT, WAS, A, BECAME, NUMBER, LETTER, IF, ELSE, TRUE, FALSE, AND, OR, NOT])

typeSpecifier = NUMBER | LETTER

number = Word(nums)
ident = ~keyword + Word(alphas, alphanums+'_')
operand = number | ident

expr = operatorPrecedence(operand,
    [
    ('-', 1, opAssoc.RIGHT),
    (oneOf('* /'), 2, opAssoc.LEFT),
    (oneOf('+ -'), 2, opAssoc.LEFT),
    ])

comparisonExpr = operatorPrecedence(expr,
    [
    ("!", 1, opAssoc.RIGHT),
    (oneOf("< > = <= >= !="), 2, opAssoc.LEFT),
    ])

booleanExpr = operatorPrecedence(TRUE | FALSE | comparisonExpr,
    [
    (NOT, 1, opAssoc.RIGHT),
    (AND, 2, opAssoc.LEFT),
    (OR, 2, opAssoc.LEFT),
    ])

statement = Forward()
printStmt  = PRINT + expr
wasaStmt   = ident + WAS + A + typeSpecifier
becameStmt = ident + BECAME + expr
ifStmt = IF + booleanExpr + statement
statement << Group(printStmt | wasaStmt | becameStmt | ifStmt)

statementList = delimitedList(statement)

tests = """\
    x was a number
    y became 2+5
    print y
    print 100*(5+2)
    print 100*5+2
    if 5 > y print 1000
    if y < 10 y became y+1, print y
    """.splitlines()[:-1]

for t in tests:
    print t.strip()
    for s in statementList.parseString(t).asList():
        print(s)
    print

输出结果:

x was a number
['x', 'WAS', 'A', 'NUMBER']

y became 2+5
['y', 'BECAME', ['2', '+', '5']]

print y
['PRINT', 'y']

print 100*(5+2)
['PRINT', ['100', '*', ['5', '+', '2']]]

print 100*5+2
['PRINT', [['100', '*', '5'], '+', '2']]

if 5 > y print 1000
['IF', ['5', '>', 'y'], ['PRINT', '1000']]

if y < 10 y became y+1, print y
['IF', ['y', '<', '10'], ['y', 'BECAME', ['y', '+', '1']]
['PRINT', 'y']

我随意添加了print作为一种语句,这样它可以出现在程序的任何地方。此外,我还尝试添加了一个IF-THEN语句,这确实展示了如何通过添加这样的控制流语句开始编写递归语法(对于'was a'、'became'和'print'来说并不需要递归)。

3

这其实取决于你怎么组织你的代码,以及你想怎么评估它。如果你是边写边评估,只要评估的顺序正确,你可能不想在p_statement_list_comma的文档字符串后面加任何东西,也就是说,就像你之前那样 - 这些语句会被评估,而且如果需要的话,你可以保持一个全局的变量字典或者类似的东西来跟踪一些状态,比如标识符的值。

如果你想建立一个解析树,比如说为了单独评估,如果你不喜欢ply的评估顺序,你可以这样做:

def p_statement_list_comma(p):
    'statement-list : statement COMMA statement-list'
    p[0] = [p[1]] + p[3]

def p_statement_print_expr(p):
    'statement-list : PRINT expr'
    p[0] = [p[2]]

这样的话,你会得到一个语句的列表,列表中的最后一个元素是一个表达式。这里使用列表是为了简单;如果你愿意,也可以使用你自己的类 - 只需把你想要的任何Python对象赋值给p[0],它就会在上一级可用。

如果你想让print表达式的结果从yacc.parse返回(解析树顶层的值会从yacc.parse返回),你可以这样做:

def p_statement_list_comma(p):
    'statement-list : statement COMMA statement-list'
    p[0] = p[3]

def p_statement_print_expr(p):
    'statement-list : PRINT expr'
    p[0] = p[2]

撰写回答