使用lxml访问有无命名空间的元素

2 投票
1 回答
1066 浏览
提问于 2025-04-17 14:12

有没有办法在一个文档中同时查找带有和不带有命名空间的相同元素,使用lxml库?举个例子,我想获取所有的identifier元素,不管它是否和特定的命名空间有关。目前我只能分别访问它们,如下所示。

代码:

from lxml import etree

xmlfile = etree.parse('xmlfile.xml')
root = xmlfile.getroot()

for l in root.iter('identifier'):
   print l.text

for l in root.iter('{http://www.openarchives.org/OAI/2.0/provenance}identifier'):
   print l.text

文件:xmlfile.xml

<?xml version="1.0"?>
<record>
  <header>
    <identifier>identifier1</identifier>
    <datestamp>datastamp1</datestamp>
    <setSpec>setspec1</setSpec>
  </header>
  <metadata>
    <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
      <dc:title>title1</dc:title>
      <dc:title>title2</dc:title>
      <dc:creator>creator1</dc:creator>
      <dc:subject>subject1</dc:subject>
      <dc:subject>subject2</dc:subject>
    </oai_dc:dc>
  </metadata>
  <about>
    <provenance  xmlns="http://www.openarchives.org/OAI/2.0/provenance" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/provenance http://www.openarchives.org/OAI/2.0/provenance.xsd">
      <originDescription altered="false" harvestDate="2011-08-11T03:47:51Z">
        <baseURL>baseURL1</baseURL>
        <identifier>identifier3</identifier>
        <datestamp>datestamp2</datestamp>
        <metadataNamespace>xxxxx</metadataNamespace>
        <originDescription altered="false" harvestDate="2010-10-10T06:15:53Z">
          <baseURL>xxxxx</baseURL>
          <identifier>identifier4</identifier>
          <datestamp>2010-04-27T01:10:31Z</datestamp>
          <metadataNamespace>xxxxx</metadataNamespace>
        </originDescription>
      </originDescription>
    </provenance>
  </about>
</record>

1 个回答

3

你可以使用XPath来解决这种问题:

from lxml import etree

xmlfile = etree.parse('xmlfile.xml')
identifier_nodes = xmlfile.xpath("//*[local-name() = 'identifier']")

撰写回答