使用XML.etree.cElementT分析XML

2条回答

网友

1楼 · 编辑于 2024-04-24 23:54:30

对于非massive.xmls（可能有几mb）来说，这样做应该没问题，但是如果您知道标记并只想将值作为输出，我找到了一种方法来完成它，这主要归功于http://enginerds.craftsy.com/blog/2014/04/parsing-large-xml-files-in-python-without-a-billion-gigs-of-ram.html，但根据我的需要修改了它，甚至根本不需要xml.etree。例如：

path = 'yourxmlfilepath.xml'
tagyouwant='Headline' #just an example, i wanted the text between 'Headline' tags
opentag='<'+tagyouwant+'>'
closetag='</'+tagyouwant+'>'

with open(path,'rb') as inputfile:
    for line in inputfile:
        if opentag in line:
            strtoget=str(line)
            strtoget=strtoget.replace(opentag,"") #trimming the tags from the text
            strtoget=strtoget.replace(closetag,"")
            print strtoget

不用最后的print语句，你可以用现在的字符串做你想做的事情。或者，也可以将其作为批处理或命令行运行，并输出到.txt中，并在运行时存储所有值（具体取决于要对其执行什么操作）。

不管怎样，我认为这是一种聪明的、内存效率高的方法，当您已经知道您想要从中得到什么时，就可以解析巨大的xml文件。

网友

2楼 · 编辑于 2024-04-24 23:54:30

请解释“不工作”对你意味着什么。我猜是您运行（或者应该运行）的代码为我工作（Python 2.x for x in（5，6））——见下文。它甚至在Python2.1上使用了对import语句的适当更改。注意，我显示了element.tag，以显示它引用了所需的元素。

>>> xml = """\
... <?xml version="1.0" encoding="ISO-8859-1"?>
... <Book>
...   <Page>
...     <Text>Blah</Text>
...   </Page>
... </Book>
... """
>>> import xml.etree.cElementTree as ET
>>> root = ET.fromstring(xml)
>>> element = root.getchildren()[0].getchildren()[0]
>>> element.tag
'Text'
>>> element.text
'Blah'
>>>

在我们把第一个问题解决之前，也许你想再仔细考虑一下你的额外问题；-）

相关问题更多 >

编程相关推荐

热门问题

热门文章