从多页网站上刮下的所有图像？

2024-04-25 17:46:48 发布

男 | 程序猿一只，喜欢编程写python代码。

I need to scrape all the images of the pages of the url given in the code but i could only do it manually each page till the last page(100th page).
This is the code for scraping each page and i replace the page number each time and run the code!
Down below
有没有办法添加一个变量函数并运行一个循环，直到它得到一个404页的错误（因为没有更多的页了）

from bs4 import*
import requests as rq
r2 = rq.get("https://www.gettyimages.in/photos/aishwarya-rai?family=editorial&page=1&phrase=aishwarya%20rai&sort=mostpopular")

soup2 = BeautifulSoup(r2.text, "html.parser") 

links = []

x = soup2.select('img[src^="https://media.gettyimages.com/photos/"]')  #the frame where it shows the images

for img in x:
    links.append(img['src'])


for index, img_link in enumerate(links):
      img_data = rq.get(img_link).content
      with open("aishwarya_rai/"+str(index+2)+'.jpg', 'wb+') as f:
           f.write(img_data)
else:
      f.close()

页面范围从1到100

我需要一些额外的代码，使“页值”成为一个变量，循环到100

Tags： and of the in import img for as

1条回答

网友

1楼 · 发布于 2024-04-25 17:46:48

使用format()函数并传递page变量

from bs4 import*
import requests as rq

url="https://www.gettyimages.in/photos/aishwarya-rai?family=editorial&page={}&phrase=aishwarya%20rai&sort=mostpopular"

links = []
for page in range(1,101):
    print(url.format(page))
    r2 = rq.get(url.format(page))
    soup2 = BeautifulSoup(r2.text, "html.parser")
    x = soup2.select('img[src^="https://media.gettyimages.com/photos/"]')  
    for img in x:
      links.append(img['src'])

print(links)

从多页网站上刮下的所有图像？

页面范围从1到100

我需要一些额外的代码，使“页值”成为一个变量，循环到100

相关问题更多 >

编程相关推荐

热门问题

热门文章

从多页网站上刮下的所有图像？

页面范围从1到100

我需要一些额外的代码，使“页值”成为一个变量，循环到100

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >