使用lxm从xml中提取数据的最有效方法

tree = etree.parse(file) for element in tree.getiterator('{http://www.openarchives.org/OAI/2.0/}record'): for leaf in element.getiterator('{http://purl.org/dc/elements/1.1/}subject'): print(leaf)

2条回答

网友

1楼 · 编辑于 2024-05-23 14:32:13

不清楚您到底想访问什么，但请尝试以下方法：

from lxml import etree
doc=etree.parse( xmlfile )
ns={'dc': 'http://purl.org/dc/elements/1.1/', 
  'oai': 'http://www.openarchives.org/OAI/2.0/'}
doc.xpath( '//dc:subject' , namespaces=ns ) # get all of the dc:subjects
doc.xpath( '//dc:*', namespaces=ns )  # get all elements in dc: namespace
# more specific path 
doc.xpath( '/oai:OAI-PMH/oai:ListRecords/oai:record/oai:metadata/*/dc:*', namespaces=ns )
x=doc.xpath( '/oai:OAI-PMH/oai:ListRecords/oai:record/oai:metadata/*' )
x[0].xpath( '*[contains(.,"Geo")]' )  # you can also call xpath from non document nodes
x[0].xpath( 'dc:subject/text()' , namespaces=ns ) # get the text of dc:subjects

在python或lxml文档之外阅读一些关于xpath的文档。它们告诉您如何在python中使用xpath，但实际上并不是xpath教程。在

注意find（）、findall（）方法采用ElementPaths，这是一种类xpath表达式的有限子集。在

网友

2楼 · 编辑于 2024-05-23 14:32:13

for element in tree.findall(".//{http://purl.org/dc/elements/1.1/}subject"):
    print element

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用lxm从xml中提取数据的最有效方法

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >