pyparsing：忽略注释时，parseAll=True不抛出ParseException

4 投票

1 回答

2628 浏览

数据工程师

提问于 2025-04-18 14:41

我发现了一个在pyparsing中奇怪的副作用：

当我在一个解析器的超集上使用.ignore()时，使用parseString(... , parseAll = True)时，它会在注释符号处停止检查整个字符串。下面的代码能更好地说明这个问题。

我该怎么解决这个问题，而不使用stringEnd呢？

示例：

def test():        
    import pyparsing as p
    unquoted_exclude = "\\\"" + "':/|<>,;#"   
    unquoted_chars = ''.join(set(p.printables) - set(unquoted_exclude))
    unquotedkey = p.Word(unquoted_chars)

    more = p.OneOrMore(unquotedkey)
    more.ignore("#" + p.restOfLine) 
    # ^^ "more" should ignore comments, but not "unquotedkey" !!

    def parse(parser, input_, parseAll=True):
        try: 
            print input_
            print parser.parseString(input_, parseAll).asList()
        except Exception as err:
            print err


    parse(unquotedkey, "abc#d")
    parse(unquotedkey, "abc|d")

    withstringend = unquotedkey + p.stringEnd 
    parse(withstringend, "abc#d", False)
    parse(withstringend, "abc|d", False)

输出：

abc#d     
['abc'] <--- should throw an exception but does not  
abc|d
Expected end of text (at char 3), (line:1, col:4)
abc#d
Expected stringEnd (at char 3), (line:1, col:4)
abc|d
Expected stringEnd (at char 3), (line:1, col:4)

错误处理解析器字符串解析注释处理 pyparsing 语法分析 parseAll

1 个回答

为了让比较更公平，你应该在定义 withstringend 后加上这一行：

withstringend.ignore('#' + p.restOfLine)

我想你会发现它的行为和你用 unquotedKey 进行解析的测试是一样的。

ignore 的作用是忽略在解析的输入文本中 任何地方 出现的某些结构，而不仅仅是在最顶层。例如，在一个 C 程序中，你不仅要忽略语句之间的注释：

/* add one to x */
x ++;

你还必须忽略可能出现在任何地方的注释：

x /* this is a post-increment 
so it really won't add 1 to x until after the
statement executes */ ++
/* and this is the trailing semicolon 
for the previous statement -> */;

或者可能是一些不那么牵强的例子：

for (x = ptr; /* start at ptr */
     *x; /* keep going as long as we point to non-zero */
     x++ /* add one to x */ )

为了支持这一点，ignore() 被设计成可以遍历整个定义的解析器，并在每个子解析器中更新可以被忽略的表达式列表，这样在整个解析器的每个层级中都能跳过这些可以忽略的内容。否则的话，你就得在解析器定义的各处到处调用 ignore，还得不断追踪那些不小心被跳过的部分。

所以在你第一个例子中，当你这样做时：

more = p.OneOrMore(unquotedKey)
more.ignore('#' + p.restOfline)

你也更新了 unquotedKey 的可忽略项。如果你想让 unquotedKey 独立出来，不受这个副作用的影响，那么可以用以下方式定义 more：

more = p.OneOrMore(unquotedKey.copy())

还有一点，你定义的无引号键是通过将键定义为“所有可打印字符，除了这些特殊字符”。你使用的这种方法在 1.5.6 版本之前是不错的，但在那时，Word 类添加了 excludeChars 参数。现在你不需要再费力去构建只包含允许字符的列表，Word 可以帮你完成这项工作。试试：

unquotedKey = p.Word(p.printables,
                     excludeChars = r'\"' + "':/|<>,;#")

回答于 2025-04-18 由 Python大师

分享举报

pyparsing：忽略注释时，parseAll=True不抛出ParseException

1 个回答

撰写回答