如何使用BeautySoup4获取链接和标题

2024-05-15 03:57:09 发布

您现在位置:Python中文网/ 问答频道 /正文

html=
"""<div class="slick-list"><div class="slick-track" style="width: 1380px; opacity: 1; transform: translate3d(0px, 0px, 0px);"><div data-index="0" class="slick-slide slick-active slick-current" tabindex="-1" aria-hidden="false" style="outline: none; width: 230px;"><div><div data-courseid="567828" class="course-discovery-unit--card-margin--2TVw4 merchandising-course-card--card--2UfMa"><a href="/course/complete-python-bootcamp/" data-purpose="merchandising-course-card-body-567828" target="_self" class="merchandising-course-card--mask--2-b-d"><div class="merchandising-course-card--card-header--89z8L"><img class="merchandising-course-card--course-image--3G7Kh" alt="" width="240" height="135" src="https://img-a.udemycdn.com/course/240x135/567828_67d0.jpg" srcset="https://img-a.udemycdn.com/course/240x135/567828_67d0.jpg 1x, https://img-a.udemycdn.com/course/480x270/567828_67d0.jpg 2x"></div><div class="merchandising-course-card--card-body--3OpAH"><div><div class="merchandising-course-card--course-title--2Ob4m" data-purpose="course-card-title">Complete Python Bootcamp: Go from zero to hero in Python 3</div>"""

我想提取链接和标题输出:

title=Complete Python Bootcamp: Go from zero to hero in Python 3
link=/course/complete-python-bootcamp/

这是我的密码:

data=soup.findAll("div",{"class":"slick-list"})
print(data)

for link in data:
    for a in link.findAll("a"):
        print(a.title,a.href)

Tags: inhttpsdivcomimgdatatitlecard
2条回答
from bs4 import BeautifulSoup

html="""<div class="slick-list"><div class="slick-track" style="width: 1380px; opacity: 1; transform: translate3d(0px, 0px, 0px);"><div data-index="0" class="slick-slide slick-active slick-current" tabindex="-1" aria-hidden="false" style="outline: none; width: 230px;"><div><div data-courseid="567828" class="course-discovery-unit card-margin 2TVw4 merchandising-course-card card 2UfMa"><a href="/course/complete-python-bootcamp/" data-purpose="merchandising-course-card-body-567828" target="_self" class="merchandising-course-card mask 2-b-d"><div class="merchandising-course-card card-header 89z8L"><img class="merchandising-course-card course-image 3G7Kh" alt="" width="240" height="135" src="https://img-a.udemycdn.com/course/240x135/567828_67d0.jpg" srcset="https://img-a.udemycdn.com/course/240x135/567828_67d0.jpg 1x, https://img-a.udemycdn.com/course/480x270/567828_67d0.jpg 2x"></div><div class="merchandising-course-card card-body 3OpAH"><div><div class="merchandising-course-card course-title 2Ob4m" data-purpose="course-card-title">Complete Python Bootcamp: Go from zero to hero in Python 3</div>"""

soup = BeautifulSoup(html, 'html.parser')

print('title='+soup.find("div",{"data-purpose":"course-card-title"}).text)
print('link='+soup.find("a").get('href'))

我希望这能回答你的问题

基于您的代码(并使用findAll)的工作解决方案:

from bs4 import BeautifulSoup

html= """<div class="slick-list"><div class="slick-track" style="width: 1380px; opacity: 1; transform: translate3d(0px, 0px, 0px);"><div data-index="0" class="slick-slide slick-active slick-current" tabindex="-1" aria-hidden="false" style="outline: none; width: 230px;"><div><div data-courseid="567828" class="course-discovery-unit card-margin 2TVw4 merchandising-course-card card 2UfMa"><a href="/course/complete-python-bootcamp/" data-purpose="merchandising-course-card-body-567828" target="_self" class="merchandising-course-card mask 2-b-d"><div class="merchandising-course-card card-header 89z8L"><img class="merchandising-course-card course-image 3G7Kh" alt="" width="240" height="135" src="https://img-a.udemycdn.com/course/240x135/567828_67d0.jpg" srcset="https://img-a.udemycdn.com/course/240x135/567828_67d0.jpg 1x, https://img-a.udemycdn.com/course/480x270/567828_67d0.jpg 2x"></div><div class="merchandising-course-card card-body 3OpAH"><div><div class="merchandising-course-card course-title 2Ob4m" data-purpose="course-card-title">Complete Python Bootcamp: Go from zero to hero in Python 3</div>"""

soup = BeautifulSoup(html, 'html.parser')

data=soup.findAll("div",{"class":"slick-list"})
#print(data)

for div in data:
    for a in div.findAll("a"):
        print(div.text,a.get('href'))

相关问题 更多 >

    热门问题