如何从一个只在滚动后显示响应的网站上一次刮取所有数据?

2024-04-19 09:09:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图从一个网站:https://www.collegenp.com/2-science-colleges/上刮取学院名称和地址,但问题是我只获取列表中前11所学院的数据,而没有获取其他学院的数据。 我已经尝试了我所知道的一切,但没有一种方法奏效

我的代码是:

from selenium import webdriver
import bs4
from bs4 import BeautifulSoup
import requests
import pandas as pd
from time import sleep

driver=webdriver.Chrome('C:/Users/acer/Downloads/chromedriver.exe')
driver.get('https://www.collegenp.com/2-science-colleges/')

driver.refresh()
sleep(20)

page=requests.get("https://www.collegenp.com/2-science-colleges/")

college = []
location=[]

soup= BeautifulSoup(page.content,'html.parser')

for a in soup.find_all('div',attrs={'class':'media'}):
  name=a.find('h3',attrs={'class':'college-name'})
  college.append(name.text)
  loc=a.find('span',attrs={'class':'college-address'})
  location.append(loc.text)

df=pd.DataFrame({'College name':college,'Locations':location})
df.to_csv('hell.csv',index=False,encoding='utf-8')

有没有办法让我可以刮取所有的数据


1条回答
网友
1楼 · 发布于 2024-04-19 09:09:15

您可以使用此代码从下一页获取信息:

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = "https://www.collegenp.com/2-science-colleges/"

headers = {"X-Requested-With": "XMLHttpRequest"}
data = {"state": "on", "action": "filter", "count": "0"}

all_data = []
for page in range(0, 5):  # <  increase number of pages here
    print("Getting page {}..".format(page))

    data["count"] = page * 10
    soup = BeautifulSoup(
        requests.post(url, data=data, headers=headers).content,
        "html.parser",
    )

    for c in soup.select(".college-name"):
        all_data.append(
            {
                "College name": c.get_text(strip=True),
                "Location": c.find_next(class_="college-address").get_text(
                    strip=True
                ),
            }
        )

df = pd.DataFrame(all_data)
print(df)
df.to_csv("data.csv", index=False)

印刷品:

                                         College name                  Location
0                     Caspian Valley College,Lalitpur      Kumaripati, Lalitpur
1      Advance Academy and Republica College,Lalitpur      Kumaripati, Lalitpur
2              Araniko International Academy,Lalitpur       Satdobato, Lalitpur
3   Bagiswori Secondary School, Taulachhen, Bhakta...    Chyamhasing, Bhaktapur
4              Bajra Barahi Secondary School,Lalitpur       Chapagaon, Lalitpur
5              Bhanubhakta Memorial College,Kathmandu       Lazimpat, Kathmandu
6                  Damak Model Secondary School,Jhapa              Damak, Jhapa
7                         Damak Multiple Campus,Jhapa              Damak, Jhapa
8                           Einstein Academy,Lalitpur       Thasikhel, Lalitpur
9                   Hari Khetan Multiple Campus,Parsa            Birganj, Parsa
10                       Kankai Adarsha Campus,Morang         Birtamode, Morang
11          Lumbini Adarsh Degree College,Nawalparasi     Kawasoti, Nawalparasi
12            Madhyabindu Multiple Campus,Nawalparasi     Kawasoti, Nawalparasi
13                Marshyangdi Multiple Campus,Lamjung       Besishahar, Lamjung

...

并保存data.csv(LibreOffice的屏幕截图):

enter image description here

相关问题 更多 >