Python Selenium硬拖网

2024-05-14 05:41:38 发布

您现在位置:Python中文网/ 问答频道 /正文

网站为:https://www.jao.eu/auctions#/

您可以看到“OUT AREA”下拉列表(我看到很多选择…)

我需要获得该列表中包含的项目的完整列表[AT,BDL-GB,BDL-NL,BE…]

你能帮帮我吗

wait = WebDriverWait(driver, 20)
driver.get('https://www.jao.eu/auctions#/')

first = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.css-1739xgv-control')))

first.click()

                                                                          
second = wait.until(......

Tags: 项目https列表网站wwwdriverareaout
2条回答

请尝试使用“请求”模块从该站点获取所需的项目列表:

import requests

link = 'https://www.jao.eu/api/v1/auction/calls/getcorridors'

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
    res = s.post(link,json={})
    items = [item['value'] for item in res.json()]
    print(items)

输出如下(截断):

'IT-CH', 'HU-SK', 'ES-PT', 'FR-IT', 'SK-CZ', 'NL-DK', 'IT-FR', 'HU-HR'

记录ones网络流量表明页面向REST API发出多个请求,其中一个端点为getcorridors,其响应为JSON并包含下拉列表中的所有值。您所需要做的就是模拟HTTP POST请求。无需硒:

def get_corridors():
    import requests
    from operator import itemgetter

    url = "https://www.jao.eu/api/v1/auction/calls/getcorridors"

    headers = {
        "Accept": "application/json",
        "Accept-Encoding": "gzip, deflate",
        "Content-Type": "application/json",
        "User-Agent": "Mozilla/5.0"
    }

    response = requests.post(url, headers=headers, json={})
    response.raise_for_status()

    return list(map(itemgetter("value"), response.json()))
    

def main():

    for corridor in get_corridors():
        print(corridor)
    
    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

输出:

IT-CH
HU-SK
ES-PT
FR-IT
SK-CZ
NL-DK
IT-FR
HU-HR
FR-ES
IT-GR
CZ-AT
DK-NL
SI-AT
CH-DE
...

相关问题 更多 >