动态pythonic语言解析模块
CodeTalker的Python项目详细描述
编码器
编码器刚刚经历了重大修改!:d
代码说话者的目标是允许快速开发解析器+ 译者没有提高表现力或灵活性。
功能:
- 完全基于python的语法定义[example grammar]
- 快速(cythonized)标记化和解析
…你还需要什么?
流程如下:
tokenize: | produce a list of tokens If you use the builtin tokens, you can get full c performance, and if you need a bit more flexibility, you can define your own token - either based on ReToken or StringToken |
---|---|
parse: | produce a ParseTree The parse tree corresponds exactly to your rules + original tokens; calling str(tree) returns the exact orignal code. Including whitespace, comments, etc. This step is perfect of you want to make some automated modifications to your code (say, prettyfication), but don’t want to completely throw out your whitespace and comments. |
Abstract Syntax Tree: | |
parsetree -> asthttp://docs.python.org/library/ast.html An AST is used if you only care about the syntax – whitespace, etc. doesn’t matter. This the case during compilation or in some cases introspection. I’ve modeled Codetalker’s AST implementation after that of python. Codetalker does the ParseTree -> AST conversion for you; you just tell it how to populate your tree, base on a given node’s children. | |
Translate: | Once you get the AST, you want to do something with it, right? Most often it’s “traverse the tree and do something with each node, depending on it’s type”. Here’s where the Translator class comes in. It provied a nice easy interface to systematically translate an AST into whatever you want. Here’s an example of creating and filling out a Translator. |
有关更多信息,请查看我发布的博客文章:Announcing: CodeTalker。
以下是json语法:
# some custom tokens class SYMBOL(ReToken): rx = re.compile('[{},[\\]:]') class TFN(ReToken): rx = re.compile('true|false|null') # rules (value is the start rule) def value(rule): rule | dict_ | list_ | STRING | TFN | NUMBER rule.pass_single = True def dict_(rule): rule | ('{', [commas((STRING, ':', value))], '}') rule.astAttrs = {'keys': STRING, 'values': value} dict_.astName = 'Dict' def list_(rule): rule | ('[', [commas(_or(dict_, list_, STRING, TFN, NUMBER))], ']') rule.astAttrs = {'values': [dict_, list_, STRING, TFN, NUMBER]} list_.astName = 'List' grammar = Grammar(start=value, tokens=[STRING, NUMBER, NEWLINE, WHITE, SYMBOL, TFN], ignore=[WHITE, NEWLINE], # we don't care about whitespace... ast_tokens=[STRING, TFN, NUMBER]) # tokens we want picked up in the Abstract Syntax Tree
待办事项
- 修改codetalker以允许流式输入