如何从网站上刮取所有产品信息？

1条回答

网友

1楼 · 发布于 2024-04-19 20:12:45

我建议使用Xpaths然后保存URL

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = False
driver = webdriver.Chrome(options=options)

page = 0
url = 'https://www.adidas.de/manner-schuhe-sneakers?start={}'.format(page)
driver.get(url)
time.sleep(4)
element = driver.find_elements_by_xpath('//*[@data-auto-id="glass-hockeycard-link"]')
all_shoes = [e.get_attribute("href") for e in element]

print("Found {} shoes.".format(len(all_shoes)))
for shoe in all_shoes:
    print('{:<50} {}'.format(shoe.split("/")[-2], shoe))


with open("shoes_page_{}.txt".format(page), "w") as f:
    f.writelines('\n'.join(all_shoes))

输出：

Found 48 shoes.
superstar-schuh                                    https://www.adidas.de/superstar-schuh/FW2293.html
zx-2k-boost-schuh                                  https://www.adidas.de/zx-2k-boost-schuh/FV9996.html
nmd_r1-v2-schuh                                    https://www.adidas.de/nmd_r1-v2-schuh/FY6862.html
zx-2k-boost-schuh                                  https://www.adidas.de/zx-2k-boost-schuh/FV9993.html
nmd_r1-v2-schuh                                    https://www.adidas.de/nmd_r1-v2-schuh/FV9022.html
ultraboost-schuh                                   https://www.adidas.de/ultraboost-schuh/BB6168.html
superstar-schuh                                    https://www.adidas.de/superstar-schuh/EG4958.html
ozweego-schuh                                      https://www.adidas.de/ozweego-schuh/FV9667.html
nite-jogger-schuh                                  https://www.adidas.de/nite-jogger-schuh/FV1267.html
continental-80-schuh                               https://www.adidas.de/continental-80-schuh/G27706.html
stan-smith-schuh                                   https://www.adidas.de/stan-smith-schuh/M20325.html
gazelle-schuh                                      https://www.adidas.de/gazelle-schuh/BB5478.html
zx-2k-boost-schuh                                  https://www.adidas.de/zx-2k-boost-schuh/FY2001.html
zx-2k-flux-schuh                                   https://www.adidas.de/zx-2k-flux-schuh/FX2044.html
ultraboost-schuh                                   https://www.adidas.de/ultraboost-schuh/F36641.html
nmd_r1-schuh                                       https://www.adidas.de/nmd_r1-schuh/FV8727.html
zx-500-schuh                                       https://www.adidas.de/zx-500-schuh/FW2811.html
zx-2k-boost-schuh                                  https://www.adidas.de/zx-2k-boost-schuh/FX8835.html
zx-700-hd-schuh                                    https://www.adidas.de/zx-700-hd-schuh/FY0995.html
nmd_r1-v2-schuh                                    https://www.adidas.de/nmd_r1-v2-schuh/FY5913.html
3mc-vulc-schuh                                     https://www.adidas.de/3mc-vulc-schuh/B22705.html
zx-2k-boost-schuh                                  https://www.adidas.de/zx-2k-boost-schuh/FV9997.html
ultraboost-20-laufschuh                            https://www.adidas.de/ultraboost-20-laufschuh/FV8329.html
zx-2k-boost-schuh                                  https://www.adidas.de/zx-2k-boost-schuh/FV8453.html
ozweego-schuh                                      https://www.adidas.de/ozweego-schuh/FV9655.html
continental-80-vegan-schuh                         https://www.adidas.de/continental-80-vegan-schuh/FW2336.html
nmd_r1-schuh                                       https://www.adidas.de/nmd_r1-schuh/D96635.html
ultraboost-winter.rdy-laufschuh                    https://www.adidas.de/ultraboost-winter.rdy-laufschuh/EG9801.html
stan-smith-schuh                                   https://www.adidas.de/stan-smith-schuh/FU9609.html
superstar-schuh                                    https://www.adidas.de/superstar-schuh/EG4959.html
carrera-low-pride-schuh                            https://www.adidas.de/carrera-low-pride-schuh/FY9019.html
zx-flux-schuh                                      https://www.adidas.de/zx-flux-schuh/S32279.html
supercourt-schuh                                   https://www.adidas.de/supercourt-schuh/EE6037.html
zx-700-hd-schuh                                    https://www.adidas.de/zx-700-hd-schuh/FY1102.html
terrex-ax3-beta-schuh                              https://www.adidas.de/terrex-ax3-beta-schuh/G26523.html
terrex-swift-r2-mid-gore-tex-wanderschuh           https://www.adidas.de/terrex-swift-r2-mid-gore-tex-wanderschuh/CM7500.html
zx-2k-boost-schuh                                  https://www.adidas.de/zx-2k-boost-schuh/FX8836.html
zx-500-schuh                                       https://www.adidas.de/zx-500-schuh/FW2812.html
zx-2k-flux-schuh                                   https://www.adidas.de/zx-2k-flux-schuh/FV9977.html
swift-run-rf-schuh                                 https://www.adidas.de/swift-run-rf-schuh/FV5358.html
zx-2k-boost-schuh                                  https://www.adidas.de/zx-2k-boost-schuh/FX8834.html
zx-500-schuh                                       https://www.adidas.de/zx-500-schuh/FW4410.html
zx-2k-boost-schuh                                  https://www.adidas.de/zx-2k-boost-schuh/FV9999.html
ultraboost-20-laufschuh                            https://www.adidas.de/ultraboost-20-laufschuh/FV8359.html
sabalo-schuh                                       https://www.adidas.de/sabalo-schuh/FV0689.html
zx-700-hd-schuh                                    https://www.adidas.de/zx-700-hd-schuh/FY0996.html
terrex-skychaser-lt-gtx-schuh                      https://www.adidas.de/terrex-skychaser-lt-gtx-schuh/F36099.html
terrex-free-hiker-cold.rdy-wanderschuh             https://www.adidas.de/terrex-free-hiker-cold.rdy-wanderschuh/FU7217.html

为什么要将URL保存到文件？好吧，这样你就不必一直抓取页面，你可以使用URL来查询API

import requests

headers = {
    "accept": "*/*",
    "accept-encoding": "gzip, deflate, br",
    "accept-language": "en-GB,en-US;q=0.9,en;q=0.8",
    "content-type": "application/json",
    "referer": "https://www.adidas.de/en/men-trainers-shoes",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:81.0) Gecko/20100101 Firefox/81.0",
}


with open("shoes_page_0.txt") as f:
    shoes = f.readlines()

print('{:<20} {:<50} {:<10}'.format('Shoe Model', 'Color', 'Price'))
for shoe in shoes:
    id_ = shoe.split("/")[-1].replace(".html", "")
    shoe_data = requests.get(f"https://www.adidas.de/api/search/product/{id_}?sitePath=en", headers=headers).json()
    print('{:<20} {:<50} {:<10}'.format(shoe_data['modelId'], shoe_data['color'], shoe_data['price']))

输出：

Shoe Model           Color                                              Price     
DVF77                Cloud White / Cloud White / Core Black             97.43     
KYJ02                Cloud White / Solar Red / Blue                     136.42    
KYK47                Core Black / Core Black / Cardboard                136.42    
KYJ02                Core Black / Core Black / Shock Pink               136.42    
KYK47                Cloud White / Core Black / Cloud White             136.42    
DWG43                Cloud White / Cloud White / Cloud White            155.92    
DVF77                Cloud White / Core Black / Cloud White             97.43     
EFK26                Off White / Off White / Signal Pink                116.93    
BTO93                Cloud White / Cloud White / Cloud White            126.68    
DRA67                Cloud White / Scarlet / Collegiate Navy            97.43     
ION05                Core White / Dark Blue / Dark Blue                 92.56     
IAZ12                Collegiate Navy / White / Gold Metallic            92.56     
KYJ02                Linen / Core Black / Orange                        136.42    
KYJ11                Cloud White / Core Black / Blue                    97.43     
DWG43                Core Black / Core Black / Active Red               155.92    
BSV73                Cloud White / Core Black / Cloud White             136.42    
KYX38                Grey Four / Grey Six / Grey Three                  97.43     
and so on...

编辑：

要获取所有鞋（截至目前），您可以尝试此操作，然后运行通过API获取信息的代码：

import os
import shutil
import time

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = False
driver = webdriver.Chrome(options=options)

for page in range(0, 848, 48):
    url = 'https://www.adidas.de/manner-schuhe-sneakers?start={}'.format(page)
    driver.get(url)
    time.sleep(4)
    element = driver.find_elements_by_xpath('//*[@data-auto-id="glass-hockeycard-link"]')
    all_shoes = [e.get_attribute("href") for e in element]

    with open("shoes_page_{}.txt".format(page), "w") as f:
        f.writelines('\n'.join(all_shoes))

    for shoe in all_shoes:
        print('{:<50} {}'.format(shoe.split("/")[-2], shoe))

driver.close()

with open("all_adidas_shoes.txt", "w") as file_to_merge_to:
    for file_to_read_from in [f"shoes_page_{p}.txt" for p in range(0, 848, 48)]:
        with open(file_to_read_from) as file:
            shutil.copyfileobj(file, file_to_merge_to)
        os.remove(file_to_read_from)

这将输出一个文件，其中包含指向所有鞋的URL

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何从网站上刮取所有产品信息？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >