Python 和 ElementTree：返回不包含父元素的“内部 XML”

18 投票

3 回答

8222 浏览

提问于 2025-04-16 02:33

在Python 2.6中，使用ElementTree库，有什么好的方法可以获取某个特定元素内部的XML内容（以字符串形式），就像在HTML和JavaScript中使用innerHTML那样？

这是我开始时的一个简化的XML节点示例：

<label attr="foo" attr2="bar">This is some text <a href="foo.htm">and a link</a> in embedded HTML</label>

我想得到这个字符串：

This is some text <a href="foo.htm">and a link</a> in embedded HTML

我试过遍历父节点并把子节点的tostring()连接起来，但这样只得到了子节点：

# returns only subnodes (e.g. <a href="foo.htm">and a link</a>)
''.join([et.tostring(sub, encoding="utf-8") for sub in node])

我可以用正则表达式搞一个解决方案，但我希望能有比这更简单的方法：

re.sub("</\w+?>\s*?$", "", re.sub("^\s*?<\w*?>", "", et.tostring(node, encoding="utf-8")))

正则表达式 XML 字符串处理数据解析 elementtree 子节点节点遍历内部xml

3 个回答

以下内容对我有用：

from xml.etree import ElementTree as etree
xml = '<root>start here<child1>some text<sub1/>here</child1>and<child2>here as well<sub2/><sub3/></child2>end here</root>'
dom = etree.XML(xml)

(dom.text or '') + ''.join(map(etree.tostring, dom)) + (dom.tail or '')
# 'start here<child1>some text<sub1 />here</child1>and<child2>here as well<sub2 /><sub3 /></child2>end here'

dom.text or '' 用来获取 root 元素开头的文本。如果没有文本，dom.text 就是 None。

需要注意的是，结果并不是有效的 XML - 有效的 XML 应该只有一个根元素。

可以看看 ElementTree 关于混合内容的文档。

使用的是 Python 2.6.5 和 Ubuntu 10.04

回答于 2025-04-16 由 Python大师

分享举报

这个方法是基于其他解决方案的，但其他方案在我的情况下没用（出现了异常），而这个方法有效：

from xml.etree import Element, ElementTree

def inner_xml(element: Element):
    return (element.text or '') + ''.join(ElementTree.tostring(e, 'unicode') for e in element)

用法和Mark Tolonen的回答一样。

回答于 2025-04-16 由 Python大师

分享举报

这样怎么样：

from xml.etree import ElementTree as ET

xml = '<root>start here<child1>some text<sub1/>here</child1>and<child2>here as well<sub2/><sub3/></child2>end here</root>'
root = ET.fromstring(xml)

def content(tag):
    return tag.text + ''.join(ET.tostring(e) for e in tag)

print content(root)
print content(root.find('child2'))

最终得到：

start here<child1>some text<sub1 />here</child1>and<child2>here as well<sub2 /><sub3 /></child2>end here
here as well<sub2 /><sub3 />

回答于 2025-04-16 由 Python大师

分享举报

Python 和 ElementTree：返回不包含父元素的“内部 XML”

3 个回答

撰写回答