如何使beautifulsoup对脚本标记的内容进行编码和解码

from bs4 import BeautifulSoup if __name__ == '__main__': htmlData = '<html> <head> <script type="text/javascript"> console.log("< < not able to write these & also these >> "); </script> </head> <body> <div> start of div </div> </body> </html>' soup = BeautifulSoup(htmlData) #... using BeautifulSoup ... print(soup.prettify() )

<html> <head> <script type="text/javascript"> console.log("< < not able to write these & also these >> "); </script> </head> <body> <div> start of div </div> </body> </html>

2条回答

网友

1楼 · 编辑于 2024-04-26 14:38:14

您可能想试试lxml：

import lxml.html as LH

if __name__ == '__main__':
    htmlData = '<html> <head> <script type="text/javascript"> console.log("< < not able to write these & also these >> "); </script> </head> <body> <div> start of div </div> </body> </html>'
    doc = LH.fromstring(htmlData)
    print(LH.tostring(doc, pretty_print = True))

收益率

^{pr2}$

网友

2楼 · 编辑于 2024-04-26 14:38:14

你可以这样做：

htmlCodes = (
('&', '&amp;'),
('<', '&lt;'),
('>', '&gt;'),
('"', '&quot;'),
("'", '&#39;'),
)

for i in htmlCodes:
    soup.prettify().replace(i[1], i[0])

相关问题更多 >

编程相关推荐

热门问题

热门文章