Python请求登录：我有错误403，但请求看起来是正确的

import requests import pickle import json session = requests.session() headers1 = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'} r = session.get('https://www.zalando.it/', headers = headers1) cookies = r.cookies url = 'https://www.zalando.it/api/reef/login' payload = {'username': "email@email.it", 'password': "password", 'wnaMode': "shop"} headers = { 'x-xsrf-token': cookies['frsx'], #'_abck': str(cookies['_abck']), 'usercentrics_enabled' : 'true', 'Connection': 'keep-alive', 'Content-Type':'application/json; charset=utf-8', 'User-Agent':"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36", 'origin':'https://www.zalando.it', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Methods': 'GET,PUT,POST,DELETE,OPTIONS', 'Access-Control-Allow-Headers': 'Origin,X-Requested-With,Content-Type,Accept,content-type,application/json', 'sec-fetch-mode': 'no-cors', 'sec-fetch-site': 'same-origin', 'accept': '*/*', 'accept-encoding': 'gzip, deflate, br', 'accept-language': 'it-IT,it;q=0.9,en-US;q=0.8,en;q=0.7', 'dpr': '1.3125', 'referer': 'https://www.zalando.it/uomo-home/', 'viewport-width': '1464' } x = session.post(url, data = json.dumps(payload), headers = headers, cookies = cookies) print(x) #error 403 print(x.text) #page that show 403

2条回答

网友

1楼 · 编辑于 2024-04-25 13:33:08

嗯，在我看来，这个网站受到Akamai的保护（看起来像Akamai Bot Manager）

当您得到403响应时，是否在/api/reef/login的响应头中看到Server: AkamaiGHost 另外，看看在合法的浏览器会话期间发送的请求：有许多请求发送到/static/{some unique ID}，有些sensor_data，包括您的用户代理，还有一些其他的“胡言乱语”

上述描述似乎与此相符：

The BMP SDK collects behavioral data while the user is interacting with the application. This behavioral data, also known as sensor data, includes the device characteristics, device orientation, accelerometer data, touch events, etc. ^{Reference: BMP SDK}

此外，this answer confirms还指出，本网站设置的某些cookie实际上属于Akamai Bot Manager

嗯，我不确定是否有一种简单的方法可以绕过它。毕竟，这是一个专门为这个目的开发的产品——阻止像你这样的网络抓取机器人

网友

2楼 · 编辑于 2024-04-25 13:33:08

对于初始请求，它需要看起来像一个实际的浏览器请求，之后需要修改头以看起来像一个xhr (Ajax) request。此外，还有一些响应头需要添加到未来对服务器的请求中，以及cookie，如客户端id和xsrf token

下面是一些当前正在运行的示例代码：

import requests

# first load the home page
home_page_link = "https://www.zalando.it/"
login_api_schema = "https://www.zalando.it/api/reef/login/schema"
login_api_post = "https://www.zalando.it/api/reef/login"

headers = {
    'Host': 'www.zalando.it',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    'Accept-Encoding': 'gzip, deflate',
    'DNT': '1',
    'Connection' : 'close',
    'Upgrade-Insecure-Requests': '1'
}


if __name__ == '__main__':

    with requests.Session() as s:
        s.headers.update(headers)

        r = s.get(home_page_link)

        # fetch these cookies: frsx, Zalando-Client-Id
        cookie_dict = s.cookies.get_dict()
        # update the headers
        # remove this header for the xhr requests
        del s.headers['Upgrade-Insecure-Requests']
        # these 2 are taken from some response cookies
        s.headers['x-xsrf-token'] = cookie_dict['frsx']
        s.headers['x-zalando-client-id'] = cookie_dict['Zalando-Client-Id']
        # i didn't pay attention to where these came from
        # just saw them and manually added them
        s.headers['x-zalando-render-page-uri'] = '/'
        s.headers['x-zalando-request-uri'] = '/'
        # this is sent as a response header and is needed to 
        # track future requests/responses
        s.headers['x-flow-id'] = r.headers['X-Flow-Id']
        # only accept json data from xhr requests
        s.headers['Accept'] = 'application/json'

        # when clicking the login button this request is sent 
        # i didn't test without this request
        r = s.get(login_api_schema)

        # add an origin header
        s.headers['Origin'] = 'https://www.zalando.it'
        # finally log in, this should return a 201 response with a cookie
        login_data = {"username":"email@email.it","password":"password","wnaMode":"modal"}
        r = s.post(login_api_post, json=login_data)
        print(r.status_code)
        print(r.headers)

相关问题更多 >

编程相关推荐

热门问题

热门文章