使用Python下载zip文件时，会给出HTTP 403

from fake_useragent import UserAgent import requests ua_str = UserAgent().chrome formattedUrl='https://www1.nseindia.com/content/historical/EQUITIES/2021/JAN/cm01JAN2021bhav.csv.zip' requestedFile = requests.get(formattedUrl,headers={"User-Agent": ua_str}) requestedFile.status_code

import zipfile, urllib.request, shutil url = formattedUrl file_name = 'cm01JAN2021bhav.csv.zip' with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file: shutil.copyfileobj(response, out_file) with zipfile.ZipFile(file_name) as zf: zf.extractall()

1条回答

网友

1楼 · 发布于 2024-06-16 19:00:42

如果下面的代码解决了类似的问题，我想你需要点击2个额外的URL并使用它返回的cookies

import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36 Edg/88.0.705.63',
            'Upgrade-Insecure-Requests':'1',
            'Accept-Encoding': 'gzip, deflate, br',
            'Accept-Language': 'en-US,en;q=0.9',
            'Connection': 'keep-alive',
            'Host': 'www1.nseindia.com',
            'Referer':'https://www1.nseindia.com/products/content/equities/equities/archieve_eq.htm',
            'Sec-Fetch-Dest': 'empty',
            'Sec-Fetch-Mode': 'cors',
            'Sec-Fetch-Site': 'same-origin'}
resp1 = requests.get('https://www1.nseindia.com/products/content/equities/equities/archieve_eq.htm',headers=headers)
print(resp1)
resp2 = requests.get('https://www1.nseindia.com/ArchieveSearch?h_filetype=eqbhav&date=05-01-2021&section=EQ',headers=headers,cookies=resp1.cookies)
print(resp2)
resp3 = requests.get('https://www1.nseindia.com/content/historical/EQUITIES/2021/JAN/cm05JAN2021bhav.csv.zip',headers=headers)
print(resp3.content)
with open('c:/temp/test.zip', 'wb') as handle:
    for block in resp3.iter_content(1024):
        if not block:
            break
        handle.write(block)

相关问题更多 >

编程相关推荐

热门问题

热门文章