我可以在循环中更新变量URL,使其在BeautifulSoup Python中无需手动输入新URL吗

0 投票
0 回答
46 浏览
提问于 2025-04-12 04:56

我设置了一个变量URL,用来让beautifulsoup抓取网页内容。请问我能否在一个循环中更新这个URL变量,这样就不用每次手动输入新的URL了?

我尝试做了一个循环,从原始URL中提取出新的URL,并用这个新URL更新原来的URL。虽然我能让代码运行而没有错误,但问题是它并没有用抓取到的新URL来更新原来的URL。我希望能找到一些beautifulSoup的专家来帮忙 :)

这是我的代码:

url = "https://www.academy.com/c/academy-clearance &facet=%27facet_Product%20Type%27:%27Shoes%27"
while url:
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, 'html.parser')

    # Extract necessary links
    links = set()
    for link in soup.find_all('a'):
        href = link.get('href')
        if href and href.startswith("/p/"):
            full_link = urllib.parse.urljoin(base_url, href)  # Construct absolute URL
            links.add(full_link)
        
    # Output the links
    for link in links:
        print(link)

    # Find the link to the next page
    next_page_element = soup.find('a', {'data-auid': 'gotoNextPage'})
    if next_page_element:
        next_page_link = next_page_element.get('href')
        next_page_url = urllib.parse.urljoin(base_url, next_page_link)  # Construct absolute URL
        print("Next Page Link:", next_page_url)
        url = next_page_url
    else:
        print("No next page link found.")
        break

我的目标是抓取一个网站,提取产品的详细信息,并在代码的最后找到下一页的链接。我希望能够用下一页的链接来更新我的原始URL变量。

0 个回答

暂无回答

撰写回答