如何在python中删除html标记

[<li style="text-align: left;"> For Female/SC/ST/ PH: NIL</li>, <li style="text-align: left;"> For Others: Rs. 200/-</li>, <li style="text-align: left;"> Candidates can pay either by depositing the money in any Branch of SBI by cash or by using net banking facility of SBI.</li>]

2条回答

网友

1楼 · 编辑于 2024-06-11 17:36:05

试试这个

from bs4 import BeautifulSoup


html = "<li style="text-align: left;">
<span style="line-height: 19px;">
For Female/SC/ST/ PH: <strong>NIL</strong></span></li>,
<li style="text-align: left;">
<span style="line-height: 19px;">For Others:
<strong>Rs. 200/-</strong></span></li>,
<li style="text-align: left;">
Candidates can pay either by depositing the money in any Branch 
of SBI by cash or by using net banking facility of SBI.</li>"

soup = BeautifulSoup(html,'html.parser')
text = soup.get_text()
print(text)

网友

2楼 · 编辑于 2024-06-11 17:36:05

有很多HTML解析库可以实现这一点，比如BeautifulSoup。另一种选择（我仍然建议BeautifulSoup，请参阅Saikrishna Rajaraman的答案）是使用带有re.sub()的正则表达式，其中s是输入字符串，如下所示：

re.sub(r'<.*?>', '', s)

这将产生：

For Female/SC/ST/ PH: NIL,

For Others:
Rs. 200/-,

Candidates can pay either by depositing the money in any Branch 
of SBI by cash or by using net banking facility of SBI.

如果您的HTML恰好存储在列表中，您可以执行以下操作（注意转换为str）：

[re.sub(r'<.*?>', '', str(s) for s in myList]

相关问题更多 >

编程相关推荐

热门问题

热门文章