用Selenium创建下一页

2024-04-19 15:40:12 发布

您现在位置:Python中文网/ 问答频道 /正文

当我导航到下面的链接并找到页面底部的分页时: https://shop.nordstrom.com/c/sale-mens-clothing?origin=topnav&breadcrumb=Home%2FSale%2FMen%2FClothing&sort=Boosted

我只能刮的前4页左右,然后我的脚本停止

我尝试了xpath、css\u选择器和WebDriverWait选项

 pages_remaining = True
 page = 2   //starts @ page 2 since page one is scraped already with first loop



 while pages_remaining:

      //scrape code

      try:
           wait = WebDriverWait(browser, 20)
           wait.until(EC.element_to_be_clickable((By.LINK_TEXT, str(page)))).click()

           print browser.current_url
           page += 1

     except TimeoutException:
           pages_remaining = False

控制台的当前结果:

 https://shop.nordstrom.com/c/sale-mens-designer-clothing-accessories-  shoes?breadcrumb=Home%2FSale%2FMen%2FDesigner&page=2&sort=Boosted

 https://shop.nordstrom.com/c/sale-mens-designer-clothing-accessories-shoes?breadcrumb=Home%2FSale%2FMen%2FDesigner&page=3&sort=Boosted

 https://shop.nordstrom.com/c/sale-mens-designer-clothing-accessories-shoes?breadcrumb=Home%2FSale%2FMen%2FDesigner&page=4&sort=Boosted

Tags: httpscompagepagessaleshopsortdesigner
2条回答

这个溶液很漂亮,因为我对硒不太熟悉。你知道吗

尝试用页数创建一个新变量。如您所见,当您进入下一页时,URL会发生变化,因此只需操作给定的URL即可。请参阅下面的代码示例。你知道吗

# Define variable pages first
pages = [str(i) for i in range(1,53)] # 53 'cuz you have 52 pages

for page in pages:
    response = get("https://shop.nordstrom.com/c/sale-mens-clothing?origin=topnav&breadcrumb=Home%2FSale%2FMen%2FClothing&page=" + page + "&sort=Boosted"
# Rest of you code

这个片段应该可以完成其余页面的工作。希望这有帮助,尽管这可能不是你一直在寻找的。你知道吗

如果您有任何问题,请发到下面。;). 你知道吗

干杯。你知道吗

您可以通过更改url循环浏览页码,直到不再显示任何结果:

from bs4 import BeautifulSoup
from selenium import webdriver

base_url = "https://m.shop.nordstrom.com/c/sale-mens-clothing?origin=topnav&breadcrumb=Home%2FSale%2FMen%2FClothing&page={}&sort=Boosted"

driver = webdriver.Chrome()

page = 1
soup = BeautifulSoup("")

#Will loop untill there's no more results
while "Looks like we don’t have exactly what you’re looking for." not in soup.text:
    print(base_url.format(page))
    #Go to page
    driver.get(base_url.format(page))
    soup = BeautifulSoup(driver.page_source)

    ### your extracting code

    page +=1

相关问题 更多 >