Selenium翻页问题
我刚开始使用Selenium,想要抓取2Gis的数据,它和Google Maps类似,但我在点击下一页的时候遇到了问题。提供的代码有什么问题,特别是为什么Selenium找不到文件路径的按钮来进入下一页呢?
这是我提供的代码
import requests
from bs4 import BeautifulSoup
import csv
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import csv
import time
from opencage.geocoder import OpenCageGeocode
import time
from selenium.common.exceptions import TimeoutException
key = "......"
geocoder = OpenCageGeocode(key)
def get_data():
headers = {
'User-Agent':,
}
s = Service('/usr/local/bin/chromedriver')
driver = webdriver.Chrome(service=s)
driver.get(f"https://2gis.kz/almaty/search/vape%20shop/page/1")
with open('vape_data.csv', 'w', encoding="utf-8") as csvfile:
csv_writer = csv.writer(csvfile)
csv_writer.writerow(['Name', 'Location', "Longitude", "Latitude"])
while True:
time.sleep(2) # Wait for page to load
page_source = driver.page_source
soup = BeautifulSoup(page_source, "lxml")
supermarkets = soup.find_all("div", class_="_1kf6gff")
for item in supermarkets:
name = item.find("span", class_="_1al0wlf").text.strip()
distc = item.find("span", class_="_1w9o2igt").text.strip()
latitude, longitude = get_coordinates(distc, geocoder)
print(f"{name} and {distc}")
with open("vape_data.csv", "a", encoding="utf-8") as file:
csv_writer = csv.writer(file)
csv_writer.writerow([name, distc, longitude, latitude])
time.sleep(0.5) # Respectful scraping pause
# Try to find and click the Next button
try:
next_button = driver.find_element(By.XPATH, 'x_path') # Update this XPATH
next_button.click()
except Exception as e:
print("No more pages or an error occurred.", str(e))
break # Exit the loop if Next button not found or error occurs
print("Finished scraping.")
driver.quit() # Close the browser
def main():
get_data()
if __name__ == "__main__":
main()
它能抓取第一页的商店信息,然后就停止了,打印出:
print("No more pages or an error occurred.", str(e))
地图链接: https://2gis.kz/almaty/search/vape%20shop
另外,你可以忽略与地理编码相关的部分。
1 个回答
0
这里有一个关于下一页的XPATH解决方案
我把find_element改成了find_elements,然后取最后一个[-1]
driver.find_elements(By.XPATH, "//div[./div/a[contains(@href, 'page')]]//*[local-name() = 'svg']")[-1]