逐块解析xml文件并获取每个b中的值

<image> <ref>www.test.com</ref> <label/> <number>0</number> <ID>ID0</ID> <name>test1</name> <comment> <line number="0">This is a comment</line> <line number="1">This is also another comment</line> </comment> <creationDate>2017-02-13T15:46:16-04:00</creationDate> </image> <result> <ref>www.test1.com</ref> <label/> <number>001</number> <ID>RE1</ID> <name>test2</name> <comment> <line number="0">This is a comment2</line> </comment> <creationDate>2017-01-13T15:46:16-04:00</creationDate> </result> <image> <ref>www.test3.com</ref> <label/> <number>1</number> <ID>ID1</ID> <value>10030</value> <name>test3</name> <comment> <line number="0">This is a comment3</line> </comment> <creationDate>2017-04-13T15:46:16-04:00</creationDate> </image>

1条回答

网友

1楼 · 发布于 2024-04-27 00:53:26

它不是“逐行”读取XML文件。它在每个元素的末尾返回一个end事件。也就是说，如果您的输入文件如下所示：

<data>
  <widgets location="earth">
    <widget name="gizmo"/>
    <widget name="gadget"/>
    <widget name="thingamajig"/>
  </widgets>
</data>

对iterparse的简单调用的返回值序列为：

^{pr2}$

如果需要，还可以在每个元素的开始处receive ^{} events，如下所示：

for event, element in etree.iterparse(fd, events=('start', 'end')):
    print event, element

其输出为：

start <Element data at 0x7fccf78cc518>
start <Element widgets at 0x7fccf78cc7e8>
start <Element widget at 0x7fccf78cc4d0>
end <Element widget at 0x7fccf78cc4d0>
start <Element widget at 0x7fccf78bdf80>
end <Element widget at 0x7fccf78bdf80>
start <Element widget at 0x7fccf78bdf38>
end <Element widget at 0x7fccf78bdf38>
end <Element widgets at 0x7fccf78cc7e8>
end <Element data at 0x7fccf78cc518>

如果我想为每个location构建一个widgets的列表，那么我可能希望通过初始化一个列表来响应start事件，然后将每个新的小部件附加到该列表中，直到到达end元素，如中所示：

from lxml import etree

with open('data2.xml') as fd:
    widgets = {}
    loc = None

    for event, element in etree.iterparse(fd, events=('start', 'end')):
        if event == 'start' and element.tag == 'widgets':
            loc = element.get('location')
            widgets[loc] = []
        elif event == 'end' and element.tag == 'widget':
            widgets[loc].append(element.get('name'))

    print widgets

其输出为：

{'earth': ['gizmo', 'gadget', 'thingamajig']}

我希望这能让您了解如何处理输入文件中的每个感兴趣的块。在

相关问题更多 >

编程相关推荐

热门问题

热门文章