XML解析性能调整python

2024-04-25 21:09:01 发布

男 | 程序猿一只，喜欢编程写python代码。

我试图解析一个17gXML文件。我需要读取每个标签（）的所有属性，并将它们保存在CSV中。我使用以下代码来解析XML。这花了太多时间来分析。例如，10000行需要5分钟以上才能导出。我有一台Windows机器（inteli5 2.4GHz/8G内存）。在

示例XML

<?xml version="1.0" encoding="utf-8"?>
<comments>
  <row Id="2" PostId="35314" Score="8" Text="Yeah, I didn't believe it until I created a console app - but good lord!  Why would they give you the rope to hang yourself!  I hated that about VB.NET - the OrElse and AndAlso keywords!" CreationDate="2008-09-06T08:09:52.330" UserId="3" /> 
</comments>

Python代码

^{pr2}$

尝试了另一个没有Pandas的代码变体，由于内存中断而失败

def process_params(elem):
    row = [elem.attrib.get('Id'),elem.attrib.get('PostId'),elem.attrib.get('Score'),elem.attrib.get('Text'),elem.attrib.get('CreationDate'),elem.attrib.get('UserId')]
    with open("comments1.csv", "a", encoding="utf-8") as f:
        wr = csv.writer(f)
        wr.writerow(row)

for event, elem in etree.iterparse("comments.xml", events=('start', 'end')):
    if event == 'start':
        if elem.tag == 'row':
            process_params(elem)

尝试了更多的变体，但没有任何效果或需要永远。请提出建议。在

Tags：内存代码 text id get xml comments utf

0条回答

目前没有回答

XML解析性能调整python

相关问题更多 >

编程相关推荐

热门问题

热门文章

XML解析性能调整python

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >