BeautifulSoup：在停止条件为m之前查找所有标签

... <div class="myc"> <a class="bbb" href="linkhere_893"> <span class="myclass">Text893</span> <img data-lazy="https://link893.jpg"/> </a> </div> <div class="myc"> <a class="bbb" href="linkhere_96"> <span class="myclass">Text96</span> <img data-lazy="https://link96.jpg"/> </a> </div> </div> <h4 class="cat-title" id="55">Title text N1 <small> Title text N2.</small></h4> <div class="list" id="55"> <div class="myc"> <a class="bbb" href="linkhere_34"> <span class="myclass">Text34</span> <img data-lazy="https://link34.jpg"/> </a> </div> <div class="myc"> ...

3条回答

网友

1楼 · 编辑于 2024-04-26 06:10:15

你可以试试这样的方法：

from bs4 import BeautifulSoup

page = """
<html><body><p>
<span class="myclass">text 1</span>
<span class="myclass">text 2</span>
</p>
<h4 class="cat-title" id="55">
 Title text N1
 <small>
  Title text N2.
 </small>
</h4>

<p>
<span class="myclass">text 3</span>
<span class="myclass">text 4</span>
</p>
</body>
</html>
"""
soup = BeautifulSoup(page, 'html.parser')

for i in soup.find_all():
    if i.name == 'h4' and i.has_attr('class') and i['class'][0] == 'cat-title' and i.has_attr('id') and i['id'] == '55':
        if i.find("small") and i.find("small").text.strip()== "Title text N2.":
            break
    elif i.name == 'span'and i.has_attr('class') and i['class'][0] == 'myclass':
        print (i)

输出：

^{pr2}$

网友

2楼 · 编辑于 2024-04-26 06:10:15

尝试使用find_all_previous()：

import requests
from bs4 import BeautifulSoup

page = requests.get("https://mysite")
soup = BeautifulSoup(page.content, 'html.parser')
stop_at = soup.find("h4", class_="cat-title", id='55') # finds your stop tag
class_extr = stop_at.find_all_previous("span", class_="myclass")

如果存在多个标记，则将在第一个<h4 class='cat-title', id=55>标记处停止。在

参考号：Beautiful Soup Documentation

网友

3楼 · 编辑于 2024-04-26 06:10:15

这个怎么样：

page = requests.get("https://mysite")
# Split your page and unwanted string, then parse with BeautifulSoup
text = page.text.split('Title text N2.')
soup = BeautifulSoup(text[0], 'html.parser')
class_extr = soup.find_all("span", class_="myclass")

相关问题更多 >

编程相关推荐

热门问题

热门文章