抓取网站,但想从一个srcset中选择一个img URL并再做九次吗

2024-04-25 11:34:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在努力从BBC声音网站上搜寻**所有**正在播放的**图像。我不在乎使用哪种尺寸,400w可能是个好主意

下面是HTML和我当前的python脚本的相关摘录。这方面的一个变体对“正在播放”的文本非常有效,但我一直无法让它对图像URL起作用,这正是我所追求的,我想可能是因为A)有太多的图像URL可供选择,b)有一个空白,这无疑是解析器不喜欢的。请记住,下面的HTML代码对于每个频道重复了大约10次。我只举了一个例子。谢谢大家!

import requests
from bs4 import BeautifulSoup

url = "https://www.bbc.co.uk/sounds"

r = requests.get(url)

soup = BeautifulSoup(r.content, "lxml")

g_data = soup.find_all("div", {"class": "sc-o-responsive-image__img sc-u-circle"})

print g_data[0].text
print g_data[1].text
print g_data[2].text
print g_data[3].text
print g_data[4].text
print g_data[5].text
print g_data[6].text
print g_data[7].text
print g_data[8].text
print g_data[9].text

<div class="gel-layout__item sc-o-island"> 
<div class="sc-c-network-item__image sc-o-island" aria-hidden="true"> 
    <div class="sc-c-rsimage sc-o-responsive-image sc-o-responsive-image--1by1 sc-u-circle"> 
<img alt="" class="sc-o-responsive-image__img sc-u-circle" 
    src="https://ichef.bbci.co.uk/images/ic/400x400/p07fzzgr.jpg" srcSet="https://ichef.bbci.co.uk/images/ic/160x160/p07fzzgr.jpg 160w,
    https://ichef.bbci.co.uk/images/ic/192x192/p07fzzgr.jpg 192w,
    https://ichef.bbci.co.uk/images/ic/224x224/p07fzzgr.jpg 224w,
    https://ichef.bbci.co.uk/images/ic/288x288/p07fzzgr.jpg 288w,
    https://ichef.bbci.co.uk/images/ic/368x368/p07fzzgr.jpg 368w,
    https://ichef.bbci.co.uk/images/ic/400x400/p07fzzgr.jpg 400w,
    https://ichef.bbci.co.uk/images/ic/448x448/p07fzzgr.jpg 448w,
    https://ichef.bbci.co.uk/images/ic/496x496/p07fzzgr.jpg 496w,
    https://ichef.bbci.co.uk/images/ic/512x512/p07fzzgr.jpg 512w,
    https://ichef.bbci.co.uk/images/ic/576x576/p07fzzgr.jpg 576w,
    https://ichef.bbci.co.uk/images/ic/624x624/p07fzzgr.jpg 624w" 
    sizes="(max-width: 400px) 34vw,(max-width: 600px) 25vw,17vw"/>

Tags: texthttpsimagedataclassjpgimagessc
1条回答
网友
1楼 · 发布于 2024-04-25 11:34:04
import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.bbc.co.uk/sounds")
soup = BeautifulSoup(r.text, 'html.parser')

for item in soup.findAll("img", {'class': 'sc-o-responsive-image__img sc-u-circle'}):
    print(item.get("src"))

输出:

https://ichef.bbci.co.uk/images/ic/400x400/p05mpj80.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p07dg040.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p07zml97.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p0428n3t.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p01lyv4b.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p06yphh0.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p05v4t1c.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p06z9zzc.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p06x0hxb.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p06n253f.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p060m6jj.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p07l4fjw.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p03710d6.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p07nn0dw.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p07nn0dw.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p078qrgm.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p07sq0gr.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p07sq0gr.jpg
https://ichef.bbci.co.uk/images/ic/400x400/p03crmyc.jpg

相关问题 更多 >

    热门问题