抓取索尼相机网页无法访问规格标签
网址是:https://electronics.sony.com/imaging/interchangeable-lens-cameras/full-frame/p/ilce7rm4-b
我需要从每台相机上抓取规格标签的信息。这个信息是在点击规格后加载的,然后再点击“查看更多”。我想知道怎么用selenium的python脚本来加载这些数据。
我试过
def specs_see_more(driver):
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH,
'//*[@id="PDPSpecificationsLink"]')))
time.sleep(3)
see_more_button = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH,
'//*[@id="PDPSpecificationsLink"]')))
driver.execute_script("arguments[0].scrollIntoView(true);", see_more_button)
WebDriverWait(driver, 10).until(
EC.visibility_of(see_more_button))
see_more_button.click()
spec_page_source = driver.page_source
driver_soup = BeautifulSoup(spec_page_source, "html.parser")
print(driver_soup)
我先点击了规格按钮,然后再打开“查看更多”,但是没有成功。
1 个回答
0
see_more_button = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH,
'//*[@id="PDPSpecificationsLink"]')))
在上面的代码行中,用来找到查看更多按钮的XPATH表达式是错误的。
请参考以下有效的代码,以便在“规格”标签中点击查看更多按钮:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://electronics.sony.com/imaging/interchangeable-lens-cameras/full-frame/p/ilce7rm4-b")
wait = WebDriverWait(driver, 10)
# Below line will click on 'Specifications' tab
wait.until(EC.element_to_be_clickable((By.ID, "PDPSpecificationsLink"))).click()
# Below line will click on 'See More' button
wait.until(EC.element_to_be_clickable((By.XPATH, "(//button[contains(text(),'See More')])[2]"))).click()
time.sleep(5)