从XML Python获取xsi类型

<?xml version="1.0" encoding="UTF-8"?> <test:myXML xmlns:test="http://com/my/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Parent> <Child1 xsi:type="sample-type"> <GrandChild1>123</GrandChild1> <GrandChild2>BranchName</GrandChild2> </Child1> <Child2 xsi:type="sample-type2"></Child2> </Parent> </test:myXML>

from lxml import etree XMLDoc = etree.parse("test.xml") rootXMLElement = XMLDoc.getroot() tree = etree.parse("test.xml") for Node in XMLDoc.xpath('//*'): if "xsi:type" in Node.attrib: #Do whatever

1条回答

网友

1楼 · 发布于 2024-06-08 20:10:01

xsi是名称空间前缀，而不是名称空间。前缀唯一需要保持一致的地方是在声明它的XML元素中

前缀甚至不需要在同一个XML文档中保持一致，您可以在同一个文档中使用任意数量的不同前缀来引用同一名称空间

它尤其是不必在XML文档和XML处理代码之间保持一致，并且您不应该（读：必须）编写任何采用前缀或依赖前缀的代码

这就是为什么if "xsi:type" in Node.attrib:毫无意义——它假定前缀必须是xsixsi可能常用于http://www.w3.org/2001/XMLSchema-instance名称空间，但这只是一种约定，不是保证

XML文档可以写成

<test:myXML xmlns:test="http://com/my/namespace" xmlns:blah="http://www.w3.org/2001/XMLSchema-instance">
<Parent>
  <Child1 blah:type="sample-type">
    <GrandChild1>123</GrandChild1>
    <GrandChild2>BranchName</GrandChild2>
  </Child1>
  <Child2 blah:type="sample-type2"></Child2>
</Parent>
</test:myXML>

这将是完全相同的事情

这就是为什么lxml在显示节点时或在其XPath方言中使用名称空间URI而不是前缀的原因——URI是重要的，前缀是短暂的

您需要在程序中定义名称空间映射

nsmap = {
  'xsi': 'http://www.w3.org/2001/XMLSchema-instance'
}

并在选择命名空间中的节点时使用该映射-明确地：

if f"{{{nsmap['xsi']}}}type" in node.attrib:
    # ...

或者通过XPath

type = node.xpath('@xsi:type', nsmap)

这使您的程序独立于前缀-您可以自由使用任何您喜欢的前缀，XML文档可以自由使用任何它喜欢的前缀，并且代码将以任何方式工作

举一个极端的例子，但概括一下想法很有用：

<test:myXML xmlns:test="http://com/my/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Parent xmlns:blah="http://www.w3.org/2001/XMLSchema-instance">
    <Child1 foo:type="sample-type" xmlns:foo="http://www.w3.org/2001/XMLSchema-instance">
      <GrandChild1>123</GrandChild1>
      <GrandChild2>BranchName</GrandChild2>
    </Child1>
    <Child2 blah:type="sample-type2"></Child2>
  </Parent>
</test:myXML>

这里，http://www.w3.org/2001/XMLSchema-instance得到3个前缀xsi，blah，foo，每个都有不同的作用域

当这个被解析时，您将使用哪一个来引用xsi？这有关系吗<这有什么关系吗？不，不应该。需要匹配的只是名称空间URI，我们一点也不关心XML文档对前缀的作用：

nsmap = {
  's': 'http://www.w3.org/2001/XMLSchema-instance'
}

type = node.xpath('@s:type', namespaces=nsmap)

相关问题更多 >

编程相关推荐

热门问题

热门文章