Python请求登录:我有错误403,但请求看起来是正确的

2024-04-25 13:33:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图使用请求库登录www.zalando.it,但每次我试图发布数据时,都会收到403错误。我在网络选项卡中看到了Zalando和登录调用,也是一样的。 这些只是虚拟数据,您可以测试创建一个测试帐户

以下是登录功能的代码:

import requests
import pickle
import json

session = requests.session()
headers1 = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'}
r = session.get('https://www.zalando.it/', headers = headers1)
cookies = r.cookies

url = 'https://www.zalando.it/api/reef/login'   
payload = {'username': "email@email.it", 'password': "password", 'wnaMode': "shop"}
headers = {
    'x-xsrf-token': cookies['frsx'],
    #'_abck': str(cookies['_abck']),
    'usercentrics_enabled' : 'true',
    'Connection': 'keep-alive',
    'Content-Type':'application/json; charset=utf-8',
    'User-Agent':"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36",
    'origin':'https://www.zalando.it',
    'Access-Control-Allow-Origin': '*',
    'Access-Control-Allow-Credentials': 'true',
    'Access-Control-Allow-Methods': 'GET,PUT,POST,DELETE,OPTIONS',
    'Access-Control-Allow-Headers': 'Origin,X-Requested-With,Content-Type,Accept,content-type,application/json',
    'sec-fetch-mode': 'no-cors',
    'sec-fetch-site': 'same-origin',
    'accept': '*/*',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'it-IT,it;q=0.9,en-US;q=0.8,en;q=0.7',
    'dpr': '1.3125',
    'referer': 'https://www.zalando.it/uomo-home/',
    'viewport-width': '1464'
    }
x = session.post(url, data = json.dumps(payload), headers = headers, cookies = cookies)
print(x) #error 403
print(x.text) #page that show 403

Tags: 数据httpsimportjsonaccesssessionwwwit
2条回答

嗯,在我看来,这个网站受到Akamai的保护(看起来像Akamai Bot Manager

当您得到403响应时,是否在/api/reef/login的响应头中看到Server: AkamaiGHost 另外,看看在合法的浏览器会话期间发送的请求:有许多请求发送到/static/{some unique ID},有些sensor_data,包括您的用户代理,还有一些其他的“胡言乱语”

上述描述似乎与此相符:

The BMP SDK collects behavioral data while the user is interacting with the application. This behavioral data, also known as sensor data, includes the device characteristics, device orientation, accelerometer data, touch events, etc. Reference: BMP SDK

此外,this answer confirms还指出,本网站设置的某些cookie实际上属于Akamai Bot Manager

嗯,我不确定是否有一种简单的方法可以绕过它。毕竟,这是一个专门为这个目的开发的产品——阻止像你这样的网络抓取机器人

对于初始请求,它需要看起来像一个实际的浏览器请求,之后需要修改头以看起来像一个xhr (Ajax) request。此外,还有一些响应头需要添加到未来对服务器的请求中,以及cookie,如客户端id和xsrf token

下面是一些当前正在运行的示例代码:

import requests

# first load the home page
home_page_link = "https://www.zalando.it/"
login_api_schema = "https://www.zalando.it/api/reef/login/schema"
login_api_post = "https://www.zalando.it/api/reef/login"

headers = {
    'Host': 'www.zalando.it',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    'Accept-Encoding': 'gzip, deflate',
    'DNT': '1',
    'Connection' : 'close',
    'Upgrade-Insecure-Requests': '1'
}


if __name__ == '__main__':

    with requests.Session() as s:
        s.headers.update(headers)

        r = s.get(home_page_link)

        # fetch these cookies: frsx, Zalando-Client-Id
        cookie_dict = s.cookies.get_dict()
        # update the headers
        # remove this header for the xhr requests
        del s.headers['Upgrade-Insecure-Requests']
        # these 2 are taken from some response cookies
        s.headers['x-xsrf-token'] = cookie_dict['frsx']
        s.headers['x-zalando-client-id'] = cookie_dict['Zalando-Client-Id']
        # i didn't pay attention to where these came from
        # just saw them and manually added them
        s.headers['x-zalando-render-page-uri'] = '/'
        s.headers['x-zalando-request-uri'] = '/'
        # this is sent as a response header and is needed to 
        # track future requests/responses
        s.headers['x-flow-id'] = r.headers['X-Flow-Id']
        # only accept json data from xhr requests
        s.headers['Accept'] = 'application/json'

        # when clicking the login button this request is sent 
        # i didn't test without this request
        r = s.get(login_api_schema)

        # add an origin header
        s.headers['Origin'] = 'https://www.zalando.it'
        # finally log in, this should return a 201 response with a cookie
        login_data = {"username":"email@email.it","password":"password","wnaMode":"modal"}
        r = s.post(login_api_post, json=login_data)
        print(r.status_code)
        print(r.headers)

相关问题 更多 >