寻找合适的Ruby/Python解析器生成器
我第一次接触的解析器生成器是Parse::RecDescent,关于它的指南和教程都很不错,但对我来说,最有用的功能是它的调试工具,特别是跟踪功能(通过将$RD_TRACE设置为1来激活)。我在寻找一个可以帮助调试规则的解析器生成器。
关键是,它必须是用Python或Ruby写的,并且要有详细模式/跟踪模式或者非常有用的调试技巧。
有没有人知道这样的解析器生成器呢?
编辑
当我说调试的时候,我不是指调试Python或Ruby。我是指调试解析器生成器,看看它在每一步做了什么,读取了每个字符,尝试匹配的规则。
编辑
我在寻找一个解析器生成器框架,以及一些调试功能的示例。我对pdb不感兴趣,而是想要解析器的调试框架。另外,请不要提到treetop,我对它不感兴趣。
5 个回答
我知道这个悬赏已经被领取了,但这里有一个用pyparsing写的等效解析器(还支持零个或多个用逗号分隔的函数调用参数):
from pyparsing import *
LPAR, RPAR = map(Suppress,"()")
EQ = Literal("=")
name = Word(alphas, alphanums+"_").setName("name")
number = Word(nums).setName("number")
expr = Forward()
operand = Optional('-') + (Group(name + LPAR +
Group(Optional(delimitedList(expr))) +
RPAR) |
name |
number |
Group(LPAR + expr + RPAR))
binop = oneOf("+ - * / **")
expr << (Group(operand + OneOrMore(binop + operand)) | operand)
assignment = name + EQ + expr
statement = assignment | expr
这段测试代码运行了解析器的基本功能:
tests = """\
sin(pi/2)
y = mx+b
E = mc ** 2
F = m*a
x = x0 + v*t +a*t*t/2
1 - sqrt(sin(t)**2 + cos(t)**2)""".splitlines()
for t in tests:
print t.strip()
print statement.parseString(t).asList()
print
输出结果是:
sin(pi/2)
[['sin', [['pi', '/', '2']]]]
y = mx+b
['y', '=', ['mx', '+', 'b']]
E = mc ** 2
['E', '=', ['mc', '**', '2']]
F = m*a
['F', '=', ['m', '*', 'a']]
x = x0 + v*t +a*t*t/2
['x', '=', ['x0', '+', 'v', '*', 't', '+', 'a', '*', 't', '*', 't', '/', '2']]
1 - sqrt(sin(t)**2 + cos(t)**2)
[['1', '-', ['sqrt', [[['sin', ['t']], '**', '2', '+', ['cos', ['t']], '**', '2']]]]]
为了调试,我们添加了这段代码:
# enable debugging for name and number expressions
name.setDebug()
number.setDebug()
现在我们重新解析第一个测试(显示输入字符串和一个简单的列标尺):
t = tests[0]
print ("1234567890"*10)[:len(t)]
print t
statement.parseString(t)
print
输出结果是:
1234567890123
sin(pi/2)
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']
Pyparsing还支持一种叫做packrat解析的功能,这是一种在解析时记忆的技巧(想了解更多关于packrat的内容,可以点击这里)。这是同样的解析过程,但启用了packrat:
same parse, but with packrat parsing enabled
1234567890123
sin(pi/2)
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']
这是一个有趣的练习,对我来说看到其他解析库的调试功能很有帮助。
我对它的调试功能不太了解,但听说PyParsing挺不错的。
Python是一种比较容易调试的编程语言。你只需要输入一些简单的代码,比如import pdb和pdb.settrace(),就可以开始调试了。
不过,这些解析器生成器据说也提供了很好的调试功能。
http://pyparsing.wikispaces.com/
关于悬赏的回应
这里展示了PLY调试的实际操作。
源代码
tokens = (
'NAME','NUMBER',
)
literals = ['=','+','-','*','/', '(',')']
# Tokens
t_NAME = r'[a-zA-Z_][a-zA-Z0-9_]*'
def t_NUMBER(t):
r'\d+'
t.value = int(t.value)
return t
t_ignore = " \t"
def t_newline(t):
r'\n+'
t.lexer.lineno += t.value.count("\n")
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
# Build the lexer
import ply.lex as lex
lex.lex(debug=1)
# Parsing rules
precedence = (
('left','+','-'),
('left','*','/'),
('right','UMINUS'),
)
# dictionary of names
names = { }
def p_statement_assign(p):
'statement : NAME "=" expression'
names[p[1]] = p[3]
def p_statement_expr(p):
'statement : expression'
print(p[1])
def p_expression_binop(p):
'''expression : expression '+' expression
| expression '-' expression
| expression '*' expression
| expression '/' expression'''
if p[2] == '+' : p[0] = p[1] + p[3]
elif p[2] == '-': p[0] = p[1] - p[3]
elif p[2] == '*': p[0] = p[1] * p[3]
elif p[2] == '/': p[0] = p[1] / p[3]
def p_expression_uminus(p):
"expression : '-' expression %prec UMINUS"
p[0] = -p[2]
def p_expression_group(p):
"expression : '(' expression ')'"
p[0] = p[2]
def p_expression_number(p):
"expression : NUMBER"
p[0] = p[1]
def p_expression_name(p):
"expression : NAME"
try:
p[0] = names[p[1]]
except LookupError:
print("Undefined name '%s'" % p[1])
p[0] = 0
def p_error(p):
if p:
print("Syntax error at '%s'" % p.value)
else:
print("Syntax error at EOF")
import ply.yacc as yacc
yacc.yacc()
import logging
logging.basicConfig(
level=logging.INFO,
filename="parselog.txt"
)
while 1:
try:
s = raw_input('calc > ')
except EOFError:
break
if not s: continue
yacc.parse(s, debug=1)
输出结果
lex: tokens = ('NAME', 'NUMBER')
lex: literals = ['=', '+', '-', '*', '/', '(', ')']
lex: states = {'INITIAL': 'inclusive'}
lex: Adding rule t_NUMBER -> '\d+' (state 'INITIAL')
lex: Adding rule t_newline -> '\n+' (state 'INITIAL')
lex: Adding rule t_NAME -> '[a-zA-Z_][a-zA-Z0-9_]*' (state 'INITIAL')
lex: ==== MASTER REGEXS FOLLOW ====
lex: state 'INITIAL' : regex[0] = '(?P<t_NUMBER>\d+)|(?P<t_newline>\n+)|(?P<t_NAME>[a-zA-Z
_][a-zA-Z0-9_]*)'
calc > 2+3
PLY: PARSE DEBUG START
State : 0
Stack : . LexToken(NUMBER,2,1,0)
Action : Shift and goto state 3
State : 3
Stack : NUMBER . LexToken(+,'+',1,1)
Action : Reduce rule [expression -> NUMBER] with [2] and goto state 9
Result : <int @ 0x1a1896c> (2)
State : 6
Stack : expression . LexToken(+,'+',1,1)
Action : Shift and goto state 12
State : 12
Stack : expression + . LexToken(NUMBER,3,1,2)
Action : Shift and goto state 3
State : 3
Stack : expression + NUMBER . $end
Action : Reduce rule [expression -> NUMBER] with [3] and goto state 9
Result : <int @ 0x1a18960> (3)
State : 18
Stack : expression + expression . $end
Action : Reduce rule [expression -> expression + expression] with [2,'+',3] and goto state
3
Result : <int @ 0x1a18948> (5)
State : 6
Stack : expression . $end
Action : Reduce rule [statement -> expression] with [5] and goto state 2
5
Result : <NoneType @ 0x1e1ccef4> (None)
State : 4
Stack : statement . $end
Done : Returning <NoneType @ 0x1e1ccef4> (None)
PLY: PARSE DEBUG END
calc >
解析表生成在parser.out文件中
Created by PLY version 3.2 (http://www.dabeaz.com/ply)
Grammar
Rule 0 S' -> statement
Rule 1 statement -> NAME = expression
Rule 2 statement -> expression
Rule 3 expression -> expression + expression
Rule 4 expression -> expression - expression
Rule 5 expression -> expression * expression
Rule 6 expression -> expression / expression
Rule 7 expression -> - expression
Rule 8 expression -> ( expression )
Rule 9 expression -> NUMBER
Rule 10 expression -> NAME
Terminals, with rules where they appear
( : 8
) : 8
* : 5
+ : 3
- : 4 7
/ : 6
= : 1
NAME : 1 10
NUMBER : 9
error :
Nonterminals, with rules where they appear
expression : 1 2 3 3 4 4 5 5 6 6 7 8
statement : 0
Parsing method: LALR
state 0
(0) S' -> . statement
(1) statement -> . NAME = expression
(2) statement -> . expression
(3) expression -> . expression + expression
(4) expression -> . expression - expression
(5) expression -> . expression * expression
(6) expression -> . expression / expression
(7) expression -> . - expression
(8) expression -> . ( expression )
(9) expression -> . NUMBER
(10) expression -> . NAME
NAME shift and go to state 1
- shift and go to state 2
( shift and go to state 5
NUMBER shift and go to state 3
expression shift and go to state 6
statement shift and go to state 4
state 1
(1) statement -> NAME . = expression
(10) expression -> NAME .
= shift and go to state 7
+ reduce using rule 10 (expression -> NAME .)
- reduce using rule 10 (expression -> NAME .)
* reduce using rule 10 (expression -> NAME .)
/ reduce using rule 10 (expression -> NAME .)
$end reduce using rule 10 (expression -> NAME .)
state 2
(7) expression -> - . expression
(3) expression -> . expression + expression
(4) expression -> . expression - expression
(5) expression -> . expression * expression
(6) expression -> . expression / expression
(7) expression -> . - expression
(8) expression -> . ( expression )
(9) expression -> . NUMBER
(10) expression -> . NAME
- shift and go to state 2
( shift and go to state 5
NUMBER shift and go to state 3
NAME shift and go to state 8
expression shift and go to state 9
state 3
(9) expression -> NUMBER .
+ reduce using rule 9 (expression -> NUMBER .)
- reduce using rule 9 (expression -> NUMBER .)
* reduce using rule 9 (expression -> NUMBER .)
/ reduce using rule 9 (expression -> NUMBER .)
$end reduce using rule 9 (expression -> NUMBER .)
) reduce using rule 9 (expression -> NUMBER .)
state 4
(0) S' -> statement .
state 5
(8) expression -> ( . expression )
(3) expression -> . expression + expression
(4) expression -> . expression - expression
(5) expression -> . expression * expression
(6) expression -> . expression / expression
(7) expression -> . - expression
(8) expression -> . ( expression )
(9) expression -> . NUMBER
(10) expression -> . NAME
- shift and go to state 2
( shift and go to state 5
NUMBER shift and go to state 3
NAME shift and go to state 8
expression shift and go to state 10
state 6
(2) statement -> expression .
(3) expression -> expression . + expression
(4) expression -> expression . - expression
(5) expression -> expression . * expression
(6) expression -> expression . / expression
$end reduce using rule 2 (statement -> expression .)
+ shift and go to state 12
- shift and go to state 11
* shift and go to state 13
/ shift and go to state 14
state 7
(1) statement -> NAME = . expression
(3) expression -> . expression + expression
(4) expression -> . expression - expression
(5) expression -> . expression * expression
(6) expression -> . expression / expression
(7) expression -> . - expression
(8) expression -> . ( expression )
(9) expression -> . NUMBER
(10) expression -> . NAME
- shift and go to state 2
( shift and go to state 5
NUMBER shift and go to state 3
NAME shift and go to state 8
expression shift and go to state 15
state 8
(10) expression -> NAME .
+ reduce using rule 10 (expression -> NAME .)
- reduce using rule 10 (expression -> NAME .)
* reduce using rule 10 (expression -> NAME .)
/ reduce using rule 10 (expression -> NAME .)
$end reduce using rule 10 (expression -> NAME .)
) reduce using rule 10 (expression -> NAME .)
state 9
(7) expression -> - expression .
(3) expression -> expression . + expression
(4) expression -> expression . - expression
(5) expression -> expression . * expression
(6) expression -> expression . / expression
+ reduce using rule 7 (expression -> - expression .)
- reduce using rule 7 (expression -> - expression .)
* reduce using rule 7 (expression -> - expression .)
/ reduce using rule 7 (expression -> - expression .)
$end reduce using rule 7 (expression -> - expression .)
) reduce using rule 7 (expression -> - expression .)
! + [ shift and go to state 12 ]
! - [ shift and go to state 11 ]
! * [ shift and go to state 13 ]
! / [ shift and go to state 14 ]
state 10
(8) expression -> ( expression . )
(3) expression -> expression . + expression
(4) expression -> expression . - expression
(5) expression -> expression . * expression
(6) expression -> expression . / expression
) shift and go to state 16
+ shift and go to state 12
- shift and go to state 11
* shift and go to state 13
/ shift and go to state 14
state 11
(4) expression -> expression - . expression
(3) expression -> . expression + expression
(4) expression -> . expression - expression
(5) expression -> . expression * expression
(6) expression -> . expression / expression
(7) expression -> . - expression
(8) expression -> . ( expression )
(9) expression -> . NUMBER
(10) expression -> . NAME
- shift and go to state 2
( shift and go to state 5
NUMBER shift and go to state 3
NAME shift and go to state 8
expression shift and go to state 17
state 12
(3) expression -> expression + . expression
(3) expression -> . expression + expression
(4) expression -> . expression - expression
(5) expression -> . expression * expression
(6) expression -> . expression / expression
(7) expression -> . - expression
(8) expression -> . ( expression )
(9) expression -> . NUMBER
(10) expression -> . NAME
- shift and go to state 2
( shift and go to state 5
NUMBER shift and go to state 3
NAME shift and go to state 8
expression shift and go to state 18
state 13
(5) expression -> expression * . expression
(3) expression -> . expression + expression
(4) expression -> . expression - expression
(5) expression -> . expression * expression
(6) expression -> . expression / expression
(7) expression -> . - expression
(8) expression -> . ( expression )
(9) expression -> . NUMBER
(10) expression -> . NAME
- shift and go to state 2
( shift and go to state 5
NUMBER shift and go to state 3
NAME shift and go to state 8
expression shift and go to state 19
state 14
(6) expression -> expression / . expression
(3) expression -> . expression + expression
(4) expression -> . expression - expression
(5) expression -> . expression * expression
(6) expression -> . expression / expression
(7) expression -> . - expression
(8) expression -> . ( expression )
(9) expression -> . NUMBER
(10) expression -> . NAME
- shift and go to state 2
( shift and go to state 5
NUMBER shift and go to state 3
NAME shift and go to state 8
expression shift and go to state 20
state 15
(1) statement -> NAME = expression .
(3) expression -> expression . + expression
(4) expression -> expression . - expression
(5) expression -> expression . * expression
(6) expression -> expression . / expression
$end reduce using rule 1 (statement -> NAME = expression .)
+ shift and go to state 12
- shift and go to state 11
* shift and go to state 13
/ shift and go to state 14
state 16
(8) expression -> ( expression ) .
+ reduce using rule 8 (expression -> ( expression ) .)
- reduce using rule 8 (expression -> ( expression ) .)
* reduce using rule 8 (expression -> ( expression ) .)
/ reduce using rule 8 (expression -> ( expression ) .)
$end reduce using rule 8 (expression -> ( expression ) .)
) reduce using rule 8 (expression -> ( expression ) .)
state 17
(4) expression -> expression - expression .
(3) expression -> expression . + expression
(4) expression -> expression . - expression
(5) expression -> expression . * expression
(6) expression -> expression . / expression
+ reduce using rule 4 (expression -> expression - expression .)
- reduce using rule 4 (expression -> expression - expression .)
$end reduce using rule 4 (expression -> expression - expression .)
) reduce using rule 4 (expression -> expression - expression .)
* shift and go to state 13
/ shift and go to state 14
! * [ reduce using rule 4 (expression -> expression - expression .) ]
! / [ reduce using rule 4 (expression -> expression - expression .) ]
! + [ shift and go to state 12 ]
! - [ shift and go to state 11 ]
state 18
(3) expression -> expression + expression .
(3) expression -> expression . + expression
(4) expression -> expression . - expression
(5) expression -> expression . * expression
(6) expression -> expression . / expression
+ reduce using rule 3 (expression -> expression + expression .)
- reduce using rule 3 (expression -> expression + expression .)
$end reduce using rule 3 (expression -> expression + expression .)
) reduce using rule 3 (expression -> expression + expression .)
* shift and go to state 13
/ shift and go to state 14
! * [ reduce using rule 3 (expression -> expression + expression .) ]
! / [ reduce using rule 3 (expression -> expression + expression .) ]
! + [ shift and go to state 12 ]
! - [ shift and go to state 11 ]
state 19
(5) expression -> expression * expression .
(3) expression -> expression . + expression
(4) expression -> expression . - expression
(5) expression -> expression . * expression
(6) expression -> expression . / expression
+ reduce using rule 5 (expression -> expression * expression .)
- reduce using rule 5 (expression -> expression * expression .)
* reduce using rule 5 (expression -> expression * expression .)
/ reduce using rule 5 (expression -> expression * expression .)
$end reduce using rule 5 (expression -> expression * expression .)
) reduce using rule 5 (expression -> expression * expression .)
! + [ shift and go to state 12 ]
! - [ shift and go to state 11 ]
! * [ shift and go to state 13 ]
! / [ shift and go to state 14 ]
state 20
(6) expression -> expression / expression .
(3) expression -> expression . + expression
(4) expression -> expression . - expression
(5) expression -> expression . * expression
(6) expression -> expression . / expression
+ reduce using rule 6 (expression -> expression / expression .)
- reduce using rule 6 (expression -> expression / expression .)
* reduce using rule 6 (expression -> expression / expression .)
/ reduce using rule 6 (expression -> expression / expression .)
$end reduce using rule 6 (expression -> expression / expression .)
) reduce using rule 6 (expression -> expression / expression .)
! + [ shift and go to state 12 ]
! - [ shift and go to state 11 ]
! * [ shift and go to state 13 ]
! / [ shift and go to state 14 ]