如何找到P标签，其中没有兄弟姐妹在它使用美丽的汤

网友

1楼 · 编辑于 2024-04-25 16:54:01

获取所有p标记而不包含子标记的解决方案。你知道吗

import bs4
html="""<p> <img src="any url"/> </p>     <p> hello world </p>"""
soup=bs4.BeautifulSoup(html,"html.parser")

def has_no_tag_children(tag):
    if  type(tag) is bs4.element.Tag: #check if tag
        if tag.name =='p': #check if it is p tag
            if  bs4.element.Tag not in [type(child) for child in tag.children]: # check if has any tag children
                return True
    return False

kids=soup.find_all(has_no_tag_children)
print(kids)

输出

[<p> hello world </p>]

网友

2楼 · 编辑于 2024-04-25 16:54:01

这将获取<p>元素中的所有文本，但不会从<p>中的任何子元素获取。Recursive必须等于false，否则它将查找子元素。我在另一个测试用例中添加了如下内容：<p><h4>Heading</h4></p>

from bs4 import BeautifulSoup

html = "<p> <img src='any url'/> </p>   <p><h4>Heading</h4></p>  <p> hello world </p>"

soup = BeautifulSoup(html)

for element in soup.findAll('p'):
    print("".join(element.findAll(text=True, recursive=False)))

网友

3楼 · 编辑于 2024-04-25 16:54:01

假设BeautifulSoup 4.7+，您应该能够做到：

import bs4
html="""<p> <img src="any url"/> </p>     <p> hello world </p>"""
soup=bs4.BeautifulSoup(html,"html.parser")

kids=soup.select("p:not(:has(*))")
print(kids)

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何找到P标签，其中没有兄弟姐妹在它使用美丽的汤

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >