对元素进行pythonxml迭代需要大量内存

import lxml.etree as ET all_members=[] tree=ET.parse(whole_path) root=tree.getroot() HH_str='//H' #get all the households HH=tree.xpath(HH_str) for H in HH: '''check if the hh satisfies the condition''' if(is_valid_hh(H)): M_str='.//M' M=H.xpath(M_str) for m in M: if(is_valid_member(m)): all_members.append(m) for member in all_members: '''do something complicated'''

1条回答

网友

1楼 · 发布于 2024-04-25 20:27:39

etree将消耗大量内存（是的，即使使用iterparse()），而且sax非常笨拙。但是，pulldom去救援！你知道吗

from xml.dom import pulldom
doc = pulldom.parse('large.xml')
for event, node in doc:
    if event == pulldom.START_ELEMENT and node.tagName == 'special': 
        # Node is 'empty' here       
        doc.expandNode(node)
        # Now we got it all
        if is_valid_hh(node):
            ...do things...

这是一个图书馆，似乎没有人不必使用它知道。文档，例如https://docs.python.org/3.7/library/xml.dom.pulldom.html

相关问题更多 >

编程相关推荐

热门问题

热门文章

对元素进行pythonxml迭代需要大量内存

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >