Python PLY 解析器
我试着在网上找这个问题的答案,但似乎找不到。我想用Python和PLY写一个解析器,解析一种我自己编造的语言。我的BNF(巴科斯-诺尔范式)简化版大概是这样的:
statement-list -> statement ',' statement-list |
'print' expr
statement -> ident 'was' 'a' type |
ident 'became' expr
type -> 'number' | 'letter'
expr -> factor |
expr '+' factor |
expr '-' factor
factor -> number | letter | ident
这里的number和letter就像int和char。
Yacc的文档(http://www.dabeaz.com/ply/ply.html#ply_nn23)只展示了简单算术表达式的语法,里面清楚地说明了p[0]应该是什么。
def p_expression_plus(p):
'expression : expression PLUS term'
p[0] = p[1] + p[3]
我想问的是,在我的BNF中,statement-list该怎么写?我有:
def p_statement_list_comma(p):
'statement-list : statement COMMA statement-list'
但我真的不确定接下来该写什么。希望能得到一些帮助!
2 个回答
我不能代表PLY的解决方案,但这里有一个使用pyparsing的例子。有时候,即使你最终想用其他库来实现解析器,pyparsing的例子也能帮你快速搭建一个简单的原型或练习。不幸的是,这个例子大量使用了operatorPrecedence
方法,这让很多中缀解析的细节变得不太明显,所以我不确定你能多容易地把它转化成其他形式。你可以在pyparsing的wiki上找到一个更传统的表达式/项/因子的解析器例子,地址在示例页面上,标题是fourFn.py。
bnf = """
statement-list -> statement ',' statement-list
statement -> ident 'was' 'a' type |
ident 'became' expr |
'print' expr |
'if' conditional-expr statement
type -> 'number' | 'letter'
expr -> factor |
expr '+' factor |
expr '-' factor
factor -> number | letter | ident
"""
from pyparsing import (CaselessKeyword, Word, nums, alphas, alphanums, operatorPrecedence,
Forward, MatchFirst, opAssoc, oneOf, Group, delimitedList)
PRINT, WAS, A, BECAME, NUMBER, LETTER, IF, ELSE, TRUE, FALSE, AND, OR, NOT = map(
CaselessKeyword,
"print was a became number letter if else true false and or not".upper().split())
keyword = MatchFirst([PRINT, WAS, A, BECAME, NUMBER, LETTER, IF, ELSE, TRUE, FALSE, AND, OR, NOT])
typeSpecifier = NUMBER | LETTER
number = Word(nums)
ident = ~keyword + Word(alphas, alphanums+'_')
operand = number | ident
expr = operatorPrecedence(operand,
[
('-', 1, opAssoc.RIGHT),
(oneOf('* /'), 2, opAssoc.LEFT),
(oneOf('+ -'), 2, opAssoc.LEFT),
])
comparisonExpr = operatorPrecedence(expr,
[
("!", 1, opAssoc.RIGHT),
(oneOf("< > = <= >= !="), 2, opAssoc.LEFT),
])
booleanExpr = operatorPrecedence(TRUE | FALSE | comparisonExpr,
[
(NOT, 1, opAssoc.RIGHT),
(AND, 2, opAssoc.LEFT),
(OR, 2, opAssoc.LEFT),
])
statement = Forward()
printStmt = PRINT + expr
wasaStmt = ident + WAS + A + typeSpecifier
becameStmt = ident + BECAME + expr
ifStmt = IF + booleanExpr + statement
statement << Group(printStmt | wasaStmt | becameStmt | ifStmt)
statementList = delimitedList(statement)
tests = """\
x was a number
y became 2+5
print y
print 100*(5+2)
print 100*5+2
if 5 > y print 1000
if y < 10 y became y+1, print y
""".splitlines()[:-1]
for t in tests:
print t.strip()
for s in statementList.parseString(t).asList():
print(s)
print
输出结果:
x was a number
['x', 'WAS', 'A', 'NUMBER']
y became 2+5
['y', 'BECAME', ['2', '+', '5']]
print y
['PRINT', 'y']
print 100*(5+2)
['PRINT', ['100', '*', ['5', '+', '2']]]
print 100*5+2
['PRINT', [['100', '*', '5'], '+', '2']]
if 5 > y print 1000
['IF', ['5', '>', 'y'], ['PRINT', '1000']]
if y < 10 y became y+1, print y
['IF', ['y', '<', '10'], ['y', 'BECAME', ['y', '+', '1']]
['PRINT', 'y']
我随意添加了print
作为一种语句,这样它可以出现在程序的任何地方。此外,我还尝试添加了一个IF-THEN语句,这确实展示了如何通过添加这样的控制流语句开始编写递归语法(对于'was a'、'became'和'print'来说并不需要递归)。
这其实取决于你怎么组织你的代码,以及你想怎么评估它。如果你是边写边评估,只要评估的顺序正确,你可能不想在p_statement_list_comma
的文档字符串后面加任何东西,也就是说,就像你之前那样 - 这些语句会被评估,而且如果需要的话,你可以保持一个全局的变量字典或者类似的东西来跟踪一些状态,比如标识符的值。
如果你想建立一个解析树,比如说为了单独评估,如果你不喜欢ply的评估顺序,你可以这样做:
def p_statement_list_comma(p):
'statement-list : statement COMMA statement-list'
p[0] = [p[1]] + p[3]
def p_statement_print_expr(p):
'statement-list : PRINT expr'
p[0] = [p[2]]
这样的话,你会得到一个语句的列表,列表中的最后一个元素是一个表达式。这里使用列表是为了简单;如果你愿意,也可以使用你自己的类 - 只需把你想要的任何Python对象赋值给p[0],它就会在上一级可用。
如果你想让print表达式的结果从yacc.parse返回(解析树顶层的值会从yacc.parse返回),你可以这样做:
def p_statement_list_comma(p):
'statement-list : statement COMMA statement-list'
p[0] = p[3]
def p_statement_print_expr(p):
'statement-list : PRINT expr'
p[0] = p[2]