pythonlxml改变标记层次结构？ - 问答 - Python中文网

pythonlxml改变标记层次结构？

2024-06-11 19:49:38 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我对lxml有个小问题。我正在将XML文档转换为HTML文档。原始XML如下所示（看起来像HTML，但在XML文档中）：

<p>Localization - Eiffel tower? Paris or Vegas <p>Bayes theorem p(A|B)</p></p>

当我这样做时（项目是上面的字符串）

^{pr2}$

我明白了：

<div><p>Localization - Eiffel tower? Paris or Vegas </p><p>Bayes theorem p(A|B)</p></div>

我对<；div>；没有任何问题，但是“Bayes定理”段落不再嵌套在外部段落中是一个问题。在

有人知道lxml为什么要这么做，以及如何阻止它吗？谢谢。在

Tags： or 项目文档 div html xml lxml 段落

2条回答

网友

1楼 · 编辑于 2024-06-11 19:49:38

您使用的是lxml的HTML解析器，而不是XML解析器。试试这个：

>>> from lxml import etree
>>> item = '<p>Eiffel tower? Paris or Vegas <p>Bayes theorem p(A|B)</p></p>'
>>> root = etree.fromstring(item)
>>> etree.tostring(root, pretty_print=True)
'<p>Eiffel tower? Paris or Vegas <p>Bayes theorem p(A|B)</p></p>\n'

网友

2楼 · 编辑于 2024-06-11 19:49:38

lxml这样做是因为它不存储无效的HTML，以及HTML中的<p>元素can't be nested：

The P element represents a paragraph. It cannot contain block-level elements (including P itself).

相关问题更多 >

编程相关推荐

热门问题

热门文章