如何删除lxm中的元素

2024-05-14 07:59:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要根据属性的内容,使用python的lxml完全删除元素。示例:

import lxml.etree as et

xml="""
<groceries>
  <fruit state="rotten">apple</fruit>
  <fruit state="fresh">pear</fruit>
  <fruit state="fresh">starfruit</fruit>
  <fruit state="rotten">mango</fruit>
  <fruit state="fresh">peach</fruit>
</groceries>
"""

tree=et.fromstring(xml)

for bad in tree.xpath("//fruit[@state=\'rotten\']"):
  #remove this element from the tree

print et.tostring(tree, pretty_print=True)

我想打印:

<groceries>
  <fruit state="fresh">pear</fruit>
  <fruit state="fresh">starfruit</fruit>
  <fruit state="fresh">peach</fruit>
</groceries>

有没有办法在不存储临时变量并手动打印到该变量的情况下执行此操作,如:

newxml="<groceries>\n"
for elt in tree.xpath('//fruit[@state=\'fresh\']'):
  newxml+=et.tostring(elt)

newxml+="</groceries>"

Tags: intreeforxmllxmletstatepear
3条回答

您正在寻找remove函数。调用树的remove方法并向其传递要删除的子元素。

import lxml.etree as et

xml="""
<groceries>
  <fruit state="rotten">apple</fruit>
  <fruit state="fresh">pear</fruit>
  <punnet>
    <fruit state="rotten">strawberry</fruit>
    <fruit state="fresh">blueberry</fruit>
  </punnet>
  <fruit state="fresh">starfruit</fruit>
  <fruit state="rotten">mango</fruit>
  <fruit state="fresh">peach</fruit>
</groceries>
"""

tree=et.fromstring(xml)

for bad in tree.xpath("//fruit[@state='rotten']"):
    bad.getparent().remove(bad)

print et.tostring(tree, pretty_print=True)

结果:

<groceries>
  <fruit state="fresh">pear</fruit>
  <fruit state="fresh">starfruit</fruit>
  <fruit state="fresh">peach</fruit>
</groceries>

我遇到了一种情况:

<div>
    <script>
        some code
    </script>
    text here
</div>

div.remove(script)将删除我无意删除的text here部分。

在回答here之后,我发现etree.strip_elements对我来说是一个更好的解决方案,您可以控制是否使用with_tail=(bool)参数删除后面的文本。

但我仍然不知道这是否可以对标记使用xpath过滤器。把这个放在通知处。

这是医生:

strip_elements(tree_or_element, *tag_names, with_tail=True)

Delete all elements with the provided tag names from a tree or subtree. This will remove the elements and their entire subtree, including all their attributes, text content and descendants. It will also remove the tail text of the element unless you explicitly set the with_tail keyword argument option to False.

Tag names can contain wildcards as in _Element.iter.

Note that this will not delete the element (or ElementTree root element) that you passed even if it matches. It will only treat its descendants. If you want to include the root element, check its tag name directly before even calling this function.

Example usage::

   strip_elements(some_element,
       'simpletagname',             # non-namespaced tag
       '{http://some/ns}tagname',   # namespaced tag
       '{http://some/other/ns}*'    # any tag from a namespace
       lxml.etree.Comment           # comments
       )

使用xmlement的^{}方法:

tree=et.fromstring(xml)

for bad in tree.xpath("//fruit[@state=\'rotten\']"):
  bad.getparent().remove(bad)     # here I grab the parent of the element to call the remove directly on it

print et.tostring(tree, pretty_print=True, xml_declaration=True)

如果必须与@Acorn版本进行比较,即使要删除的元素不在xml的根节点下,我的版本也可以工作。

相关问题 更多 >

    热门问题