如何在lxml中移除元素

99 投票

6 回答

85863 浏览

提问于 2025-04-17 05:29

我需要根据某个属性的内容，完全删除一些元素，使用的是Python的lxml库。举个例子：

import lxml.etree as et

xml="""
<groceries>
  <fruit state="rotten">apple</fruit>
  <fruit state="fresh">pear</fruit>
  <fruit state="fresh">starfruit</fruit>
  <fruit state="rotten">mango</fruit>
  <fruit state="fresh">peach</fruit>
</groceries>
"""

tree=et.fromstring(xml)

for bad in tree.xpath("//fruit[@state=\'rotten\']"):
  #remove this element from the tree

print et.tostring(tree, pretty_print=True)

我希望这段代码能打印出：

<groceries>
  <fruit state="fresh">pear</fruit>
  <fruit state="fresh">starfruit</fruit>
  <fruit state="fresh">peach</fruit>
</groceries>

有没有办法做到这一点，而不需要先存一个临时变量，然后手动打印，比如：

newxml="<groceries>\n"
for elt in tree.xpath('//fruit[@state=\'fresh\']'):
  newxml+=et.tostring(elt)

newxml+="</groceries>"

6 个回答

我遇到了一种情况：

<div>
    <script>
        some code
    </script>
    text here
</div>

div.remove(script) 这个命令会把我不想删除的 text here 部分也删掉。

根据这里的回答我发现，使用 etree.strip_elements 对我来说是个更好的解决方案，因为它可以通过 with_tail=(bool) 参数来控制是否删除后面的文本。

不过我还是不太确定这个方法能否用 xpath 过滤标签。我只是想把这个信息分享出来。

这里是相关文档：

strip_elements(tree_or_element, *tag_names, with_tail=True)

这个命令会从一个树或子树中删除所有指定标签名的元素。它会删除这些元素及其整个子树，包括所有属性、文本内容和子元素。如果你不特别把 with_tail 参数设置为 False，它也会删除元素后面的文本。

标签名可以使用通配符，就像在 _Element.iter 中一样。

需要注意的是，即使你传入的元素（或元素树的根元素）匹配了标签名，这个命令也不会删除它。它只会处理它的子元素。如果你想包括根元素，最好在调用这个函数之前直接检查它的标签名。

示例用法：
   strip_elements(some_element,
       'simpletagname',             # non-namespaced tag
       '{http://some/ns}tagname',   # namespaced tag
       '{http://some/other/ns}*'    # any tag from a namespace
       lxml.etree.Comment           # comments
       )

回答于 2025-04-17 由 Python大师

分享举报

你需要找的是 remove 函数。你可以调用树的 remove 方法，并传入你想要删除的子元素。

import lxml.etree as et

xml="""
<groceries>
  <fruit state="rotten">apple</fruit>
  <fruit state="fresh">pear</fruit>
  <punnet>
    <fruit state="rotten">strawberry</fruit>
    <fruit state="fresh">blueberry</fruit>
  </punnet>
  <fruit state="fresh">starfruit</fruit>
  <fruit state="rotten">mango</fruit>
  <fruit state="fresh">peach</fruit>
</groceries>
"""

tree=et.fromstring(xml)

for bad in tree.xpath("//fruit[@state='rotten']"):
    bad.getparent().remove(bad)

print et.tostring(tree, pretty_print=True)

结果：

<groceries>
  <fruit state="fresh">pear</fruit>
  <fruit state="fresh">starfruit</fruit>
  <fruit state="fresh">peach</fruit>
</groceries>

回答于 2025-04-17 由 Python大师

分享举报

191

使用xmlElement的 remove 方法：

tree=et.fromstring(xml)

for bad in tree.xpath("//fruit[@state=\'rotten\']"):
  bad.getparent().remove(bad)     # here I grab the parent of the element to call the remove directly on it

print et.tostring(tree, pretty_print=True, xml_declaration=True)

如果要和@Acorn的版本做个比较，我的这个方法即使要删除的元素不直接在xml的根节点下面，也能正常工作。

回答于 2025-04-17 由 Python大师

分享举报

如何在lxml中移除元素

6 个回答

撰写回答