我试图解析一个17gXML文件。我需要读取每个标签()的所有属性,并将它们保存在CSV中。我使用以下代码来解析XML。这花了太多时间来分析。例如,10000行需要5分钟以上才能导出。我有一台Windows机器(inteli5 2.4GHz/8G内存)。在
示例XML
<?xml version="1.0" encoding="utf-8"?>
<comments>
<row Id="2" PostId="35314" Score="8" Text="Yeah, I didn't believe it until I created a console app - but good lord! Why would they give you the rope to hang yourself! I hated that about VB.NET - the OrElse and AndAlso keywords!" CreationDate="2008-09-06T08:09:52.330" UserId="3" />
</comments>
Python代码
^{pr2}$尝试了另一个没有Pandas的代码变体,由于内存中断而失败
def process_params(elem):
row = [elem.attrib.get('Id'),elem.attrib.get('PostId'),elem.attrib.get('Score'),elem.attrib.get('Text'),elem.attrib.get('CreationDate'),elem.attrib.get('UserId')]
with open("comments1.csv", "a", encoding="utf-8") as f:
wr = csv.writer(f)
wr.writerow(row)
for event, elem in etree.iterparse("comments.xml", events=('start', 'end')):
if event == 'start':
if elem.tag == 'row':
process_params(elem)
尝试了更多的变体,但没有任何效果或需要永远。请提出建议。在
目前没有回答
相关问题 更多 >
编程相关推荐