迭代两次后无法点击xpath地址
我现在正在使用Python和Selenium库来进行网页自动化,主要是想从亚马逊上提取数据。我的代码可以成功地模拟鼠标输入,找到搜索框,并列出页面上所有可用的产品。不过,我在一个循环上遇到了一些问题,这个循环是用来逐个访问页面上每个产品的,依赖于它们各自的XPath地址。这个循环有时候能正常执行,但有时候又会失败。我希望能得到一些帮助,来排查和解决这个问题。任何建议或见解都非常感谢。
from selenium import webdriver
from time import sleep
from selenium.webdriver.common.by import By
import time
import datetime
import requests
from bs4 import BeautifulSoup
import pandas as pd
path='C://chromedriver.exe'
catalog=["mouse"]
columns_for_amazon=[catalog[0],"Brand Name", "Product Name","Price","Star","NumberOfReviews","Comments","Date_Collected","Time_Collected"]
#exist_pf=pd.read_csv("C:\\Users\\USER\\Desktop\\software engineering\\proje klasörü\\main.csv")
new_df=pd.DataFrame(columns=columns_for_amazon)
options = webdriver.ChromeOptions()
options.add_experimental_option("useAutomationExtension", False)
options.add_experimental_option("excludeSwitches",["enable-automation"])
browser=webdriver.Chrome(options=options)
browser.maximize_window()
browser.get("https://www.amazon.com/")
#this is for Amaozon's secure
"""
sleep(5)
enter_button=browser.find_element(By.XPATH,'/html/body/div/div[1]/div[3]/div/div/form/div[2]/div/span/span/button')
enter_button.click()
"""
#get input elements
input_search=browser.find_element(By.XPATH,'//*[@id="twotabsearchtextbox"]')
search_button=browser.find_element(By.XPATH,'//*[@id="nav-search-submit-button"]')
#send input to webpage
input_search.send_keys(catalog[0])
search_button.click()
for i in range(2,10):
#*****************************************************************
#this is the problem i am encounter
try:
log=browser.find_element(By.XPATH,f'//*[@id="search"]/div[1]/div[1]/div/span[1]/div[1]/div[{i}]/div/div/div/div/span/div/div/div/div[2]/div/div/div[1]/h2/a/span')
print(i)
except:
print(i,"numbered iteration is NOT finished")
continue
sleep(4)
#*****************************************************************
log.click()
#scraping of all valuable informations
page=requests.get(browser.current_url)
soup1=BeautifulSoup(page.content,"html.parser") #takes every html code of page
soup2=BeautifulSoup(soup1.prettify(),"html.parser")
try:
title = soup2.find(id='bylineInfo').get_text()
except:
continue
title=title.strip()
product=soup2.find(id='productTitle').get_text()
product=product.strip()
star=soup2.find(class_='a-popover-trigger a-declarative').get_text()
star=star.strip()[0:3]
price=soup2.find(class_='a-offscreen').get_text()
price=price.strip()
no_reviews=soup2.find(id='acrCustomerReviewText').get_text()
no_reviews=no_reviews.strip()
comments=soup2.find(class_='a-section review-views celwidget').get_text()
comments_list=list()
for i in comments.split("\n"):
comments_list.append(i.strip())
if comments_list[-1]=="":
comments_list.pop()
dtime= datetime.datetime.now()
log_date= dtime.strftime("%x")
log_time=dtime.strftime("%X")
log_list=[catalog[0],title,product,price,star,no_reviews,comments_list,log_date,log_time]
lenght=len(new_df)
new_df.loc[lenght]=log_list
print(i,"numbered iteration is finished")
browser.back()
new_df.to_csv('C:\\Users\\USER\\Desktop\\software engineering\\proje klasörü\\main_mouse_otomatize.csv')
这是我的输出
[.... ERROR:cert_issuer_source_aia.cc(35)] Error parsing cert retrieved from AIA (as DER):
ERROR: Couldn't read tbsCertificate as SEQUENCE
ERROR: Failed parsing Certificate
numbered iteration is finished
3
numbered iteration is finished
4 numbered iteration is NOT finished
5 numbered iteration is NOT finished
6 numbered iteration is NOT finished
7 numbered iteration is NOT finished
8 numbered iteration is NOT finished
9 numbered iteration is NOT finished
1 个回答
-2
听起来你在代码中点击XPath地址时遇到了问题,特别是在执行了两次之后。这可能是因为几个原因,比如XPath地址动态变化、循环或迭代逻辑有问题,或者在前两次迭代后元素变得不可用或无法点击。
要解决这个问题,你可以尝试以下步骤:
检查XPath:确保你使用的XPath地址正确指向你想点击的元素。有时候,网页结构的动态变化会导致XPath地址在几次迭代后失效。
检查元素是否可用:确认你想点击的元素在每次迭代后都是可用且可以点击的。使用适当的等待或条件,确保元素完全加载并准备好进行操作。
调试循环逻辑:检查你的循环或迭代逻辑,确保它正确地遍历了你想要的元素,而不是跳过或重复某些元素。
动态变化:如果网页内容是动态的,可以考虑使用更稳健的方法来定位元素,比如CSS选择器或其他属性,而不是XPath。
检查控制台错误:查看浏览器控制台是否有任何错误或警告,这些可能会提供为什么元素在两次迭代后不可点击的线索。
使用显式等待:实现显式等待,以确保在尝试点击之前元素是可点击的,特别是当页面内容是动态加载时。
通过仔细检查这些代码的各个方面,你应该能够找到并解决在两次迭代后点击XPath地址的问题。