美化组4：删除注释标记及其内容

<div class="foo"> cat dog sheep goat  </div>

3条回答

网友
1楼 · 编辑于 2024-06-01 00:54:20

From this answer 如果您正在寻找BeautifulGroup版本3的解决方案BS3 Docs - Comment
soup = BeautifulSoup("""Hello! """) comment = soup.find(text=re.compile("if")) Comment=comment.__class__ for element in soup(text=lambda text: isinstance(text, Comment)): element.extract() print soup.prettify()

网友
2楼 · 编辑于 2024-06-01 00:54:20

您可以使用^{}（解决方案基于this answer）：
PageElement.extract() removes a tag or string from the tree. It returns the tag or string that was extracted.
from bs4 import BeautifulSoup, Comment data = """<div class="foo"> cat dog sheep goat  </div>""" soup = BeautifulSoup(data) div = soup.find('div', class_='foo') for element in div(text=lambda text: isinstance(text, Comment)): element.extract() print soup.prettify()
因此，您的div没有注释：
<div class="foo"> cat dog sheep goat </div>

网友
3楼 · 编辑于 2024-06-01 00:54:20

通常不需要修改bs4解析树。你可以直接得到div的文本，如果这是你想要的：

soup.body.div.text
Out[18]: '\ncat dog sheep goat\n\n'

bs4分隔注释。但是，如果确实需要修改解析树：

from bs4 import Comment

for child in soup.body.div.children:
    if isinstance(child,Comment):
        child.extract()

相关问题更多 >

编程相关推荐

热门问题

热门文章