使用lxml的XPath谓词和子路径？

8 投票

4 回答

8153 浏览

提问于 2025-04-16 18:51

我正在尝试理解一个XPath，它是为了在ACORD XML表单中使用的（这种格式在保险行业很常见）。他们给我的XPath是（为了简洁省略了一部分）：

./PersApplicationInfo/InsuredOrPrincipal[InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]/GeneralPartyInfo

我遇到的问题是，Python的lxml库告诉我，[InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]是一个无效的谓词。我在XPath规范中找不到任何地方提到这种语法，所以我无法修改这个谓词使其有效。

有没有什么文档可以说明这个谓词到底在选择什么？另外，这个谓词是否有效，还是说某个地方出了问题？

可能相关：

我相信我正在合作的公司是一个微软的环境，所以这个XPath在C#或其他相关语言中可能是有效的？我不太确定。

更新：

根据评论的要求，这里有一些额外的信息。

XML示例：

<ACORD>
  <InsuranceSvcRq>
    <HomePolicyQuoteInqRq>
      <PersPolicy>
        <PersApplicationInfo>
            <InsuredOrPrincipal>
                <InsuredOrPrincipalInfo>
                    <InsuredOrPrincipalRoleCd>AN</InsuredOrPrincipalRoleCd>
                </InsuredOrPrincipalInfo>
                <GeneralPartyInfo>
                    <Addr>
                        <Addr1></Addr1>
                    </Addr>
                </GeneralPartyInfo>
            </InsuredOrPrincipal>
        </PersApplicationInfo>
      </PersPolicy>
    </HomePolicyQuoteInqRq>
  </InsuranceSvcRq>
</ACORD>

代码示例（包含完整的XPath而不是片段）：

>>> from lxml import etree
>>> tree = etree.fromstring(raw)
>>> tree.find('./InsuranceSvcRq/HomePolicyQuoteInqRq/PersPolicy/PersApplicationInfo/InsuredOrPrincipal[InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]/GeneralPartyInfo/Addr/Addr1')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "lxml.etree.pyx", line 1409, in lxml.etree._Element.find (src/lxml/lxml.etree.c:39972)
  File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 271, in find
    it = iterfind(elem, path, namespaces)
  File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 261, in iterfind
    selector = _build_path_iterator(path, namespaces)
  File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 245, in _build_path_iterator
    selector.append(ops[token[0]](_next, token))
  File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 207, in prepare_predicate
    raise SyntaxError("invalid predicate")
SyntaxError: invalid predicate

XML lxml c# xpath 数据选择保险行业谓词 acord

4 个回答

./PersApplicationInfo/InsuredOrPrincipal
                 [InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]
                     /GeneralPartyInfo/

./PersApplicationInfo/InsuredOrPrincipal
                 [InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd='AN']
                     /GeneralPartyInfo

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:template match="/*">
  <xsl:copy-of select=
  './InsuranceSvcRq/HomePolicyQuoteInqRq/PersPolicy
                 /PersApplicationInfo/InsuredOrPrincipal
                     [InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]
                                                   /GeneralPartyInfo/Addr/Addr1'/>
 </xsl:template>
</xsl:stylesheet>

<ACORD>
    <InsuranceSvcRq>
        <HomePolicyQuoteInqRq>
            <PersPolicy>
                <PersApplicationInfo>
                    <InsuredOrPrincipal>
                        <InsuredOrPrincipalInfo>
                            <InsuredOrPrincipalRoleCd>AN</InsuredOrPrincipalRoleCd>
                        </InsuredOrPrincipalInfo>
                        <GeneralPartyInfo>
                            <Addr>
                                <Addr1></Addr1>
                            </Addr>
                        </GeneralPartyInfo>
                    </InsuredOrPrincipal>
                </PersApplicationInfo>
            </PersPolicy>
        </HomePolicyQuoteInqRq>
    </InsuranceSvcRq>
</ACORD>

<Addr1 />

这个表达式有几个问题：

结尾的 / 字符让它在语法上无效。这个字符表示要开始一个新的位置步骤，但后面什么都没有跟上。
正如迈克尔·凯博士提到的，你在Python中可能会遇到嵌套引号的问题。

建议的解决方案：

在这个表达式中，双引号被替换成了单引号。第二个变化是去掉了结尾的 / 字符。

更新：现在提问者提供了一个更完整的代码示例，我可以确认实际使用的XPath表达式没有问题。下面是用XSLT验证的结果：

当这个转换应用于提供的XML文档时：

会产生想要的、正确的结果：

结论：问题可能出在Python代码的使用上，或者（可能性较小）使用的XPath引擎有bug。

回答于 2025-04-16 由 Python大师

分享举报

我觉得你的例子完全没问题。你可以看看lxml的XPath实现有没有什么已知的限制或者类似的说明。

回答于 2025-04-16 由 Python大师

分享举报

把 tree.find 改成 tree.xpath。find 和 findall 是 lxml 提供的，用来和其他 ElementTree 的实现兼容。不过，这些方法并没有实现完整的 XPath 语言。如果你想用包含更高级特性的 XPath 表达式，就要使用 xpath 方法、XPath 类或者 XPathEvaluator。

举个例子：

import io
import lxml.etree as ET

content='''\
<ACORD>
  <InsuranceSvcRq>
    <HomePolicyQuoteInqRq>
      <PersPolicy>
        <PersApplicationInfo>
            <InsuredOrPrincipal>
                <InsuredOrPrincipalInfo>
                    <InsuredOrPrincipalRoleCd>AN</InsuredOrPrincipalRoleCd>
                </InsuredOrPrincipalInfo>
                <GeneralPartyInfo>
                    <Addr>
                        <Addr1></Addr1>
                    </Addr>
                </GeneralPartyInfo>
            </InsuredOrPrincipal>
        </PersApplicationInfo>
      </PersPolicy>
    </HomePolicyQuoteInqRq>
  </InsuranceSvcRq>
</ACORD>
'''
tree=ET.parse(io.BytesIO(content))
path='//PersApplicationInfo/InsuredOrPrincipal[InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]/GeneralPartyInfo'
result=tree.xpath(path)
print(result)

会得到

[<Element GeneralPartyInfo at b75a8194>]

而 tree.find 则会得到

SyntaxError: invalid node predicate

回答于 2025-04-16 由 Python大师

分享举报

使用lxml的XPath谓词和子路径？

4 个回答

撰写回答