简单解析库

sourcer的Python项目详细描述


python的简单解析库。

目前还没有太多的文档,性能可能还不错 不好,但如果你想试试,就试试吧!

请随时将您的反馈发送到vonseg@gmail.com。或者使用github issue tracker

Installation

要安装sourcer:

pip install sourcer

如果未安装PIP,请使用Easy_安装:

easy_install sourcer

或者从github下载源代码 安装时使用:

python setup.py install

Examples

Example 1: Hello, World!

让我们解析字符串“你好,世界!“(确保基本功能正常):

fromsourcerimport*# Let's parse strings like "Hello, foo!", and just keep the "foo" part.greeting='Hello'>>Opt(',')>>' '>>Pattern(r'\w+')<<'!'# Let's try it on the string "Hello, World!"person1=parse(greeting,'Hello, World!')assertperson1=='World'# Now let's try omitting the comma, since we made it optional (with "Opt").person2=parse(greeting,'Hello Chief!')assertperson2=='Chief'

关于此示例的一些注释:

  • >>运算符表示“放弃左操作数的结果。只是 从右操作数返回结果。“
  • <<运算符类似地表示“只从结果返回结果” 从左操作数并丢弃右操作数的结果。“
  • Opt表示“此术语是可选的。如果它在那里就分析它,否则就 继续。”
  • Pattern表示“分析与此正则表达式匹配的字符串。”

Example 2: Parsing Arithmetic Expressions

下面是一个快速示例,演示如何使用运算符优先分析:

fromsourcerimport*Int=Pattern(r'\d+')*intParens='('>>ForwardRef(lambda:Expr)<<')'Expr=OperatorPrecedence(Int|Parens,InfixRight('^'),Prefix('+','-'),Postfix('%'),InfixLeft('*','/'),InfixLeft('+','-'),)# Now let's try parsing an expression.t1=parse(Expr,'1+2^3/4')assertt1==Operation(1,'+',Operation(Operation(2,'^',3),'/',4))# Let's try putting some parentheses in the next one.t2=parse(Expr,'1*(2+3)')assertt2==Operation(1,'*',Operation(2,'+',3))# Finally, let's try using a unary operator in our expression.t3=parse(Expr,'-1*2')assertt3==Operation(Operation(None,'-',1),'*',2)

关于此示例的一些注释:

  • *运算符表示“从左操作数取得结果,然后 应用右侧的功能。“
  • 在本例中,函数只是int
  • 因此在我们的示例中,Int规则匹配任何数字字符字符串 并生成相应的int值。
  • 所以我们示例中的Parens规则解析括号中的表达式, 丢弃括号。
  • ForwardRef术语是必需的,因为Parens规则希望 请参阅Expr规则,但Expr尚未由该点定义。
  • OperatorPrecedence规则构造运算符优先表。 它解析操作并返回Operation对象。

Example 3: Building an Abstract Syntax Tree

让我们尝试为 lambda calculus。我们可以利用 Struct同时定义ast和解析器的类:

fromsourcerimport*classIdentifier(Struct):defparse(self):self.name=WordclassAbstraction(Struct):defparse(self):self.parameter='\\'>>Wordself.body='. '>>ExprclassApplication(LeftAssoc):defparse(self):self.left=Operandself.operator=' 'self.right=OperandWord=Pattern(r'\w+')Parens='('>>ForwardRef(lambda:Expr)<<')'Operand=Parens|Abstraction|IdentifierExpr=Application|Operandt1=parse(Expr,r'(\x. x) y')assertisinstance(t1,Application)assertisinstance(t1.left,Abstraction)assertisinstance(t1.right,Identifier)assertt1.left.parameter=='x'assertt1.left.body.name=='x'assertt1.right.name=='y't2=parse(Expr,'x y z')assertisinstance(t2,Application)assertisinstance(t2.left,Application)assertisinstance(t2.right,Identifier)assertt2.left.left.name=='x'assertt2.left.right.name=='y'assertt2.right.name=='z'

Example 4: Tokenizing

在解析输入之前标记它通常很有用。让我们创建一个 lambda微积分的标记器。

fromsourcerimport*classLambdaTokens(TokenSyntax):def__init__(self):self.Word=r'\w+'self.Symbol=AnyChar(r'(\.)')self.Space=Skip(r'\s+')# Run the tokenizer on a lambda term with a bunch of random whitespace.Tokens=LambdaTokens()ans1=tokenize(Tokens,'\n (   x  y\n\t) ')# Assert that we didn't get any space tokens.assertlen(ans1)==4(t1,t2,t3,t4)=ans1assertisinstance(t1,Tokens.Symbol)andt1.content=='('assertisinstance(t2,Tokens.Word)andt2.content=='x'assertisinstance(t3,Tokens.Word)andt3.content=='y'assertisinstance(t4,Tokens.Symbol)andt4.content==')'# Let's use the tokenizer with a simple grammar, just to show how that# works.Sentence=Some(Tokens.Word)<<'.'ans2=tokenize_and_parse(Tokens,Sentence,'This is a test.')# Assert that we got a list of Word tokens.assertall(isinstance(i,Tokens.Word)foriinans2)# Assert that the tokens have the expected content.contents=[i.contentforiinans2]assertcontents==['This','is','a','test']

在本例中,Skip术语告诉标记赋予器我们要忽略 空白。AnyChar术语告诉标记赋予器符号可以是任何 其中一个字符(\.)。或者,我们可以 使用:

Symbol=r'[(\\.)]'

Example 5: Parsing Significant Indentation

我们可以使用sourcer解析具有显著缩进的语言。这是一个 一个简单的例子来演示一种可能的方法。

fromsourcerimport*classTestTokens(TokenSyntax):def__init__(self):# Let's just use words, newlines, and spaces in this example.self.Word=r'\w+'self.Newline=r'\n'# In this case, we'll say that an indent is a newline followed by# some spaces, followed by a word.self.Indent=r'(?<=\n) +(?=\w)'# And let's just throw out all other space characters.self.Space=Skip(' +')# All our token classes are attributes of this ``Tokens`` object. It's# essentially a namespace for our token classes.Tokens=TestTokens()classInlineStatement(Struct):defparse(self):# Let's say an inline-statement is just some word tokens. We'll use# ``Content`` to get the string content of each token (since in this# case, we don't care about the tokens themselves).self.words=Some(Content(Tokens.Word))def__repr__(self):# We'll define a ``repr`` method so that we can easily check the# parse results. We'll just put a semicolon after each statement.return'%s;'%' '.join(self.words)classBlock(Struct):defparse(self,indent=''):# A block is a bunch of statements at the same indentation,# all separated by some newline tokens.self.statements=Statement(indent)//Some(Tokens.Newline)def__repr__(self):# In this case, we'll put a space between each statement and enclose# the whole block in curly braces. This will make it easy for us to# tell if our parse results look right.return'{%s}'%' '.join(repr(i)foriinself.statements)defStatement(indent):# Let's say there are two ways to get a statement:# - Get an inline-statement with the current indentation.# - Get a block that is indented farther than the current indentation.return(CurrentIndent(indent)>>InlineStatement|IncreaseIndent(indent)**Block)defCurrentIndent(indent):# The point of this function is to return a parsing expression that# matches the current indent (which is provided as an argument).returnReturn('')ifindent==''elseindentdefIncreaseIndent(current):# To see if the next indentation is more than the current indentation,# we peek at the next token, using ``Expect``, and we get its string# content using ``Content``. The ``^`` operator means "require". In this# case, we require that the next indentation is longer than the current# indentation.token=Expect(Content(Tokens.Indent))returntoken^(lambdatoken:len(current)<len(token))# Let's say that a program is a block, optionally surrounded by newlines.# (The ``>>`` and ``<<`` operators discard the newlines in this case.)OptNewlines=List(Tokens.Newline)Program=OptNewlines>>Block<<OptNewlinestest='''
print foo
while true
    print bar
    if baz
        then break
exit
'''# Let's parse the test case and then use ``repr`` to make sure that we get# back what we expect.ans=tokenize_and_parse(Tokens,Program,test)expect='{print foo; while true; {print bar; if baz; {then break;}} exit;}'assertrepr(ans)==expect

More Examples

解析Excel formula 以及一些相应的 test cases

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
在Eclipse中使用多个调用在一行上打印java   javajackson序列化问题。只有同一实体的第一个对象可以很好地序列化   Java中Deflate函数的等价充气   使用customlitview的java Android actionbar搜索   java“<T>T get()”是什么意思?(它有用吗?)   目标c使用CommonCrypto使用AES256加密,使用OpenSSL或Java解密   java在运行时更新资源文件   fileinputstream在java中访问并将数据写入现有文件   带集群的java Android Mapbox我希望每个功能都有不同的标记图像   java JDK8>JDK10:PKIX路径生成失败:SunCertPathBuilderException:找不到请求目标的有效证书路径   java使用Hk2生成具有指定构造函数参数的实例   为什么这个系统。出来Java中的println()打印到控制台?   java目录和文件名连接不起作用   使用mockito和通配符绘图的java