使用BeautifulSoup4删除所有HTML标记（Python3.4）

from bs4 import BeautifulSoup text = "<td colspan='2' class='ToEx'>This is a test (<i> to see </i> this works) and I really hope it does</td>" soup = BeautifulSoup(text) content = soup.find_all("td","ToEx") content[0].renderContents()

2条回答

网友
1楼 · 编辑于 2024-04-20 02:04:54

只需打印标记的.text属性，就可以得到它的文本
print(content[0].text)
输出：
^{pr2}$

网友
2楼 · 编辑于 2024-04-20 02:04:54

我会使用get_text()-它是为这种情况而设计的：
text = "<td colspan='2' class='ToEx'>This is a test (<i> to see </i> this works) and I really hope it does</td>" soup = BeautifulSoup(text) print(soup.get_text())
这应该可以工作as per the documentation。在
我以前从未见过.text使用过，相反，在Beautiful Soup 4中，请使用.string-如果你想用的话：
^{pr2}$
两者都将输出：
This is a test ( to see this works) and I really hope it does
这两种方法都可以很好地工作，但是get_text()将更容易使用，特别是如果您想将文本保存到变量等

相关问题更多 >

编程相关推荐

热门问题

热门文章