XML解析性能调整python

2024-04-25 21:09:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图解析一个17gXML文件。我需要读取每个标签()的所有属性,并将它们保存在CSV中。我使用以下代码来解析XML。这花了太多时间来分析。例如,10000行需要5分钟以上才能导出。我有一台Windows机器(inteli5 2.4GHz/8G内存)。在

示例XML

<?xml version="1.0" encoding="utf-8"?>
<comments>
  <row Id="2" PostId="35314" Score="8" Text="Yeah, I didn't believe it until I created a console app - but good lord!  Why would they give you the rope to hang yourself!  I hated that about VB.NET - the OrElse and AndAlso keywords!" CreationDate="2008-09-06T08:09:52.330" UserId="3" /> 
</comments>

Python代码

^{pr2}$

尝试了另一个没有Pandas的代码变体,由于内存中断而失败

def process_params(elem):
    row = [elem.attrib.get('Id'),elem.attrib.get('PostId'),elem.attrib.get('Score'),elem.attrib.get('Text'),elem.attrib.get('CreationDate'),elem.attrib.get('UserId')]
    with open("comments1.csv", "a", encoding="utf-8") as f:
        wr = csv.writer(f)
        wr.writerow(row)

for event, elem in etree.iterparse("comments.xml", events=('start', 'end')):
    if event == 'start':
        if elem.tag == 'row':
            process_params(elem)

尝试了更多的变体,但没有任何效果或需要永远。请提出建议。在


Tags: 内存代码textidgetxmlcommentsutf