使用Python minidom读取XML并遍历每个节点

26 投票

5 回答

60518 浏览

提问于 2025-04-15 14:15

我有一个XML结构，长得像下面这样，不过要大得多：

<root>
    <conference name='1'>
        <author>
            Bob
        </author>
        <author>
            Nigel
        </author>
    </conference>
    <conference name='2'>
        <author>
            Alice
        </author>
        <author>
            Mary
        </author>
    </conference>
</root>

为此，我用了以下代码：

dom = parse(filepath)
conference=dom.getElementsByTagName('conference')
for node in conference:
    conf_name=node.getAttribute('name')
    print conf_name
    alist=node.getElementsByTagName('author')
    for a in alist:
        authortext= a.nodeValue
        print authortext

但是，打印出来的authortext是'None'。我尝试了一些不同的写法，比如下面这个，但结果导致我的程序出错。

authortext=a[0].nodeValue

正确的输出应该是：

1
Bob
Nigel
2
Alice
Mary

但我得到的是：

1
None
None
2
None
None

有没有什么建议可以解决这个问题？

XML 编程错误代码调试数据解析 minidom 节点遍历

5 个回答

快速访问：

node.getElementsByTagName('author')[0].childNodes[0].nodeValue

回答于 2025-04-15 由 Python大师

分享举报

元素节点是没有值的。你需要查看它里面的文本节点。如果你知道里面总是有一个文本节点，你可以用 element.firstChild.data 来获取内容（对于文本节点来说，data和nodeValue是一样的）。

要小心：如果里面没有文本内容，就不会有子文本节点，这时 element.firstChild 会是空的，这样访问 .data 就会出错。

快速获取直接子文本节点内容的方法：

text= ''.join(child.data for child in element.childNodes if child.nodeType==child.TEXT_NODE)

在DOM Level 3 Core中，你可以使用 textContent 属性来递归获取元素内部的文本，但minidom不支持这个功能（其他一些Python的DOM实现是支持的）。

回答于 2025-04-15 由 Python大师

分享举报

你的 authortext 是类型 1（ELEMENT_NODE），通常你需要有 TEXT_NODE 才能获取字符串。这样做是可以的。

a.childNodes[0].nodeValue

回答于 2025-04-15 由 Python大师

分享举报

使用Python minidom读取XML并遍历每个节点

5 个回答

撰写回答