为什么lxml中的这个元素包含尾部？

from lxml import etree html = ''' <html xmlns="http://www.w3.org/1999/xhtml"> <head></head> <body> This is some text followed with 2 citations.1 2This is some more text. </body> </html>''' tree = etree.fromstring(html) for element in tree.findall(".//{*}span"): if element.get("class") == 'footnote': print(etree.tostring(element, encoding="unicode", pretty_print=True))

2条回答

网友

1楼 · 编辑于 2024-06-16 18:57:59

它包括元素后面的文本，因为该文本属于元素。在

如果不希望该文本属于上一个范围，则需要将其包含在其自己的元素中。但是，在将元素转换回XML时，可以避免打印此文本，并将with_tail=False作为etree.tostring()的参数。在

如果要从特定元素中移除元素tail，也可以简单地将元素tail设置为''。在

网友

2楼 · 编辑于 2024-06-16 18:57:59

指定with_tail=False将删除尾部文本。在

print(etree.tostring(element, encoding="unicode", pretty_print=True, with_tail=False))

见^{} documentation。在

相关问题更多 >

编程相关推荐

热门问题

热门文章