将XML解析为哈希表

4 投票

6 回答

3556 浏览

提问于 2025-04-15 17:03

我有一个XML文件，格式如下：

<doc>
<id name="X">
  <type name="A">
    <min val="100" id="80"/>
    <max val="200" id="90"/>
   </type>
  <type name="B">
    <min val="100" id="20"/>
    <max val="20" id="90"/>
  </type>
</id>

<type...>
</type>
</doc>

我想解析这个文档，并建立一个哈希表。

{X: {"A": [(100,80), (200,90)], "B": [(100,20), (20,90)]}, Y: .....}

我该如何在Python中做到这一点呢？

数据结构哈希表 xml解析

6 个回答

别再重复造轮子了，直接用Amara工具包就行。变量名其实就像字典里的钥匙一样。

http://www.xml3k.org/Amara

回答于 2025-04-15 由 Python大师

分享举报

我不同意其他回答中提到的使用minidom的建议——这只是Python对一种原本为其他语言设计的标准的勉强适配，虽然能用，但并不是特别合适。在现代Python中，推荐使用ElementTree。

同样的接口在第三方模块lxml中也有实现，而且速度更快。不过，除非你需要非常快的速度，否则Python标准库中自带的版本就足够用了（而且比minidom快）——关键是要按照这个接口来编程，这样将来如果你想换成其他实现，只需要对自己的代码做很小的修改就可以了。

例如，在进行必要的导入等操作后，下面的代码就是你示例的一个最简实现（它并不验证XML是否正确，只是假设数据是正确的并提取数据——当然，添加各种检查是相对简单的）：

from xml.etree import ElementTree as et  # or, import any other, faster version of ET

def xml2data(xmlfile):
  tree = et.parse(xmlfile)
  data = {}
  for anid in tree.getroot().getchildren():
    currdict = data[anid.get('name')] = {}
    for atype in anid.getchildren():
      currlist = currdict[atype.get('name')] = []
      for c in atype.getchildren():
        currlist.append((c.get('val'), c.get('id')))
  return data

这段代码会根据你的示例输入产生你想要的结果。

回答于 2025-04-15 由 Python大师

分享举报

正如其他人所说，minidom 是解决这个问题的好方法。你需要打开并解析文件，在浏览节点的时候，检查这些节点是否相关，是否需要读取。这样，你也能知道是否要读取子节点。

我简单写了这个代码，似乎能满足你的需求。有些值是通过属性的位置来读取，而不是通过属性的名称。而且没有错误处理。最后的 print() 表示这是 Python 3.x 的代码。

我就不多说了，留给你自己去改进这个代码，只是想给你一个起点。

祝你编程愉快！ :)

xml.txt

<doc>
<id name="X">
  <type name="A">
    <min val="100" id="80"/>
    <max val="200" id="90"/>
   </type>
  <type name="B">
    <min val="100" id="20"/>
    <max val="20" id="90"/>
  </type>
</id>
</doc>

parsexml.py

from xml.dom import minidom
data={}
doc=minidom.parse("xml.txt")
for n in doc.childNodes[0].childNodes:
    if n.localName=="id":
        id_name = n.attributes.item(0).nodeValue
        data[id_name] = {}
        for j in n.childNodes:
            if j.localName=="type":
                type_name = j.attributes.item(0).nodeValue
                data[id_name][type_name] = [(),()]
                for k in j.childNodes:
                    if k.localName=="min":
                        data[id_name][type_name][0] = \
                            (k.attributes.item(1).nodeValue, \
                             k.attributes.item(0).nodeValue)
                    if k.localName=="max":
                        data[id_name][type_name][1] = \
                            (k.attributes.item(1).nodeValue, \
                             k.attributes.item(0).nodeValue)
print (data)

输出：

{'X': {'A': [('100', '80'), ('200', '90')], 'B': [('100', '20'), ('20', '90')]}}

回答于 2025-04-15 由 Python大师

分享举报

将XML解析为哈希表

6 个回答

撰写回答