迭代两次后无法点击xpath地址

Question

我现在正在使用Python和Selenium库来进行网页自动化，主要是想从亚马逊上提取数据。我的代码可以成功地模拟鼠标输入，找到搜索框，并列出页面上所有可用的产品。不过，我在一个循环上遇到了一些问题，这个循环是用来逐个访问页面上每个产品的，依赖于它们各自的XPath地址。这个循环有时候能正常执行，但有时候又会失败。我希望能得到一些帮助，来排查和解决这个问题。任何建议或见解都非常感谢。

from selenium import webdriver
from time import sleep
from selenium.webdriver.common.by import By
import time
import datetime
import requests
from bs4 import BeautifulSoup
import pandas as pd

path='C://chromedriver.exe'
catalog=["mouse"]


columns_for_amazon=[catalog[0],"Brand Name", "Product Name","Price","Star","NumberOfReviews","Comments","Date_Collected","Time_Collected"]
#exist_pf=pd.read_csv("C:\\Users\\USER\\Desktop\\software engineering\\proje klasörü\\main.csv")
new_df=pd.DataFrame(columns=columns_for_amazon)

options = webdriver.ChromeOptions()
options.add_experimental_option("useAutomationExtension", False)
options.add_experimental_option("excludeSwitches",["enable-automation"])

browser=webdriver.Chrome(options=options)
browser.maximize_window()
browser.get("https://www.amazon.com/")

#this is for Amaozon's secure 
"""
sleep(5)
enter_button=browser.find_element(By.XPATH,'/html/body/div/div[1]/div[3]/div/div/form/div[2]/div/span/span/button')
enter_button.click()
"""

#get input elements
input_search=browser.find_element(By.XPATH,'//*[@id="twotabsearchtextbox"]')
search_button=browser.find_element(By.XPATH,'//*[@id="nav-search-submit-button"]')

#send input to webpage
input_search.send_keys(catalog[0])
search_button.click()


for i in range(2,10):
    #*****************************************************************
    #this is the problem i am encounter
    try:
        log=browser.find_element(By.XPATH,f'//*[@id="search"]/div[1]/div[1]/div/span[1]/div[1]/div[{i}]/div/div/div/div/span/div/div/div/div[2]/div/div/div[1]/h2/a/span')
        print(i)
    except:
        print(i,"numbered iteration is NOT finished")
        continue 
    sleep(4)
    #*****************************************************************    
    log.click()
    #scraping of all valuable informations
    page=requests.get(browser.current_url)
    soup1=BeautifulSoup(page.content,"html.parser") #takes every html code of page
    soup2=BeautifulSoup(soup1.prettify(),"html.parser")
    try:
        title = soup2.find(id='bylineInfo').get_text()
    except:
        continue 
    title=title.strip()

    product=soup2.find(id='productTitle').get_text()
    product=product.strip()


    star=soup2.find(class_='a-popover-trigger a-declarative').get_text()
    star=star.strip()[0:3]

    price=soup2.find(class_='a-offscreen').get_text()
    price=price.strip()

    no_reviews=soup2.find(id='acrCustomerReviewText').get_text()
    no_reviews=no_reviews.strip()

    comments=soup2.find(class_='a-section review-views celwidget').get_text()
    comments_list=list()
    for i in comments.split("\n"):
        comments_list.append(i.strip())
        if comments_list[-1]=="":
            comments_list.pop()

    dtime= datetime.datetime.now()
    log_date= dtime.strftime("%x")
    log_time=dtime.strftime("%X")
    log_list=[catalog[0],title,product,price,star,no_reviews,comments_list,log_date,log_time]

    lenght=len(new_df)
    new_df.loc[lenght]=log_list

    print(i,"numbered iteration is finished")
    browser.back()


new_df.to_csv('C:\\Users\\USER\\Desktop\\software engineering\\proje klasörü\\main_mouse_otomatize.csv')

这是我的输出

[.... ERROR:cert_issuer_source_aia.cc(35)] Error parsing cert retrieved from AIA (as DER):
ERROR: Couldn't read tbsCertificate as SEQUENCE
ERROR: Failed parsing Certificate

 numbered iteration is finished
3
 numbered iteration is finished
4 numbered iteration is NOT finished
5 numbered iteration is NOT finished
6 numbered iteration is NOT finished
7 numbered iteration is NOT finished
8 numbered iteration is NOT finished
9 numbered iteration is NOT finished

循环控制数据提取 xpath selenium 网页自动化亚马逊产品

迭代两次后无法点击xpath地址

1 个回答

撰写回答