分析文本usng Combine未返回任何结果

2024-03-29 08:13:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我是新来的。我试图解析一些文本,但不太明白pyparsing是如何工作的。你知道吗

from pyparsing import *

number = Word(nums)
yearRange = Combine(number+"-"+number)
copyright = Literal("Copyright (C)")+yearRange+Literal("CA. All Rights Reserved.")
copyrightCombine = Combine(copyright)
date = Combine(Word(nums)+"/"+Word(nums)+"/"+Word(nums))
time = Combine(Word(nums)+":"+Word(nums)+":"+Word(nums))
dateTime = Combine(date+time)
pageNumber = Suppress(Literal("PAGE"))+number
pageLine = Word(nums)+"Copyright (C) 1986-2014 CA. All Rights Reserved."+Combine(Word(nums)+"/"+Word(nums)+"/"+Word(nums))+Combine(Word(nums)+":"+Word(nums)+":"+Word(nums))+pageNumber
pageLine2 = number+copyright+dateTime+pageNumber
pageLine3 = Word(nums)+copyright+Combine(Word(nums)+"/"+Word(nums)+"/"+Word(nums))+Combine(Word(nums)+":"+Word(nums)+":"+Word(nums))+pageNumber

test = "1  Copyright (C) 1986-2014 CA. All Rights Reserved.                                                07/05/17  10:58:56     PAGE  1241"
print(pageLine.searchString(test))
print(copyright.searchString(test))
print(copyrightCombine.searchString(test))
print(pageLine2.searchString(test))
print(pageLine3.searchString(test))

输出:

[['1', 'Copyright (C) 1986-2014 CA. All Rights Reserved.', '07/05/17', '10:58:56', '1241']]
[['Copyright (C)', '1986-2014', 'CA. All Rights Reserved.']]
[]
[]
[['1', 'Copyright (C)', '1986-2014', 'CA. All Rights Reserved.', '07/05/17', '10:58:56', '1241']]

由于某种原因,我想使用定义为pageLine2的解析器,解析器没有返回任何结果。当我尝试使用Combine()时,似乎有什么原因导致解析无法返回匹配项。你知道吗


Tags: testnumberallcawordcopyrightprintreserved
1条回答
网友
1楼 · 发布于 2024-03-29 08:13:23

我发现这种行为的发生是因为Combine()的工作方式。它期望令牌之间不会有任何空白,但可以重写。你知道吗

根据the documentation

Combine - joins all matched tokens into a single string, using specified joinString (default joinString=""); expects all matching tokens to be adjacent, with no intervening whitespace (can be overridden by specifying adjacent=False in constructor)

相关问题 更多 >