使用Selenium Returns[]刮取易趣出售的物品

2024-06-09 16:22:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我几乎没有网络垃圾处理的经验,也无法使用BeautifulSoup解决这个问题,所以我正在尝试selenium(今天安装)。我正试图从易趣上搜刮已售出的物品。我在努力刮:

https://www.ebay.com/sch/i.html?_from=R40&_nkw=oakley+sunglasses&_sacat=0&Brand=Oakley&rt=nc&LH_Sold=1&LH_Complete=1&_ipg=200&_oaa=1&_fsrp=1&_dcat=79720

以下是我的代码,我在其中加载html代码并转换为selenium html:

    ebay_url = 'https://www.ebay.com/sch/i.html?_from=R40&_nkw=oakley+sunglasses&_sacat=0&Brand=Oakley&rt=nc&LH_Sold=1&LH_Complete=1&_ipg=200&_oaa=1&_fsrp=1&_dcat=79720'

    html = requests.get(ebay_url)
    #print(html.text)

    driver = wd.Chrome(executable_path=r'/Users/mburley/Downloads/chromedriver')
    driver.get(ebay_url)

在正确的url上正确打开一个新的chrome会话。我正在努力获取标题、价格和销售日期,然后将其加载到csv文件中。以下是我的代码:

    # Find all div tags and set equal to main_data
    all_items = driver.find_elements_by_class_name("s-item__info clearfix")[1:]
    #print(main_data)

    # Loop over main_data to extract div classes for title, price, and date
    for item in all_items:
    date = item.find_element_by_xpath("//span[contains(@class, 'POSITIVE']").text.strip()
    title = item.find_element_by_xpath("//h3[contains(@class, 's-item__title s-item__title--has-tags']").text.strip()
    price = item.find_element_by_xpath("//span[contains(@class, 's-item__price']").text.strip()

    print('title:', title)
    print('price:', price)
    print('date:', date)
    print('---')
    data.append( [title, price, date] )

只返回[]。我想ebay可能会屏蔽我的IP,但是html代码加载进来,看起来是正确的。希望有人能帮忙!谢谢


Tags: 代码texturldatadatebytitlehtml
1条回答
网友
1楼 · 发布于 2024-06-09 16:22:51

您可以使用下面的代码来获取详细信息。您还可以使用熊猫将数据存储在csv文件中

代码:

ebay_url = 'https://www.ebay.com/sch/i.html?_from=R40&_nkw=oakley+sunglasses&_sacat=0&Brand=Oakley&rt=nc&LH_Sold=1&LH_Complete=1&_ipg=200&_oaa=1&_fsrp=1&_dcat=79720'

html = requests.get(ebay_url)
# print(html.text)

driver = wd.Chrome(executable_path=r'/Users/mburley/Downloads/chromedriver')
driver.maximize_window()
driver.implicitly_wait(30)
driver.get(ebay_url)


wait = WebDriverWait(driver, 20)
sold_date = []
title = []
price = []
i = 1
for item in driver.find_elements(By.XPATH, "//div[contains(@class,'title tagblock')]/span[@class='POSITIVE']"):
    sold_date.append(item.text)
    title.append(driver.find_element_by_xpath(f"(//div[contains(@class,'title tagblock')]/span[@class='POSITIVE']/ancestor::div[contains(@class,'tag')]/following-sibling::a/h3)[{i}]").text)
    price.append(item.find_element_by_xpath(f"(//div[contains(@class,'title tagblock')]/span[@class='POSITIVE']/ancestor::div[contains(@class,'tag')]/following-sibling::div[contains(@class,'details')]/descendant::span[@class='POSITIVE'])[{i}]").text)
    i = i + 1

print(sold_date)
print(title)
print(price)

data = {
         'Sold_date': sold_date,
         'title': title,
         'price': price
        }
df = pd.DataFrame.from_dict(data)
df.to_csv('out.csv', index = 0)

导入:

import pandas as pd
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By

相关问题 更多 >