从多页网站上刮下的所有图像?

2024-04-25 17:46:48 发布

您现在位置:Python中文网/ 问答频道 /正文

I need to scrape all the images of the pages of the url given in the code but i could only do it manually each page till the last page(100th page).

This is the code for scraping each page and i replace the page number each time and run the code!

Down below

有没有办法添加一个变量函数并运行一个循环,直到它得到一个404页的错误(因为没有更多的页了)

from bs4 import*
import requests as rq
r2 = rq.get("https://www.gettyimages.in/photos/aishwarya-rai?family=editorial&page=1&phrase=aishwarya%20rai&sort=mostpopular")

soup2 = BeautifulSoup(r2.text, "html.parser") 

links = []

x = soup2.select('img[src^="https://media.gettyimages.com/photos/"]')  #the frame where it shows the images

for img in x:
    links.append(img['src'])


for index, img_link in enumerate(links):
      img_data = rq.get(img_link).content
      with open("aishwarya_rai/"+str(index+2)+'.jpg', 'wb+') as f:
           f.write(img_data)
else:
      f.close()

页面范围从1到100

我需要一些额外的代码,使“页值”成为一个变量,循环到100


Tags: andoftheinimportimgforas
1条回答
网友
1楼 · 发布于 2024-04-25 17:46:48

使用format()函数并传递page变量

from bs4 import*
import requests as rq

url="https://www.gettyimages.in/photos/aishwarya-rai?family=editorial&page={}&phrase=aishwarya%20rai&sort=mostpopular"

links = []
for page in range(1,101):
    print(url.format(page))
    r2 = rq.get(url.format(page))
    soup2 = BeautifulSoup(r2.text, "html.parser")
    x = soup2.select('img[src^="https://media.gettyimages.com/photos/"]')  
    for img in x:
      links.append(img['src'])

print(links)

相关问题 更多 >