如何使用Python从URL下载文件,requests重定向到错误页面时怎么办
我正在尝试用Python下载一个文件:
在我的浏览器中下载是没问题的,但当我用Python尝试下载时,却被重定向到了一个错误页面。返回的内容是Errors.aspx的HTML,而不是我想要的zip文件的数据。
这是我尝试过的:
import requests
url = 'https://www.dallascad.org/ViewPDFs.aspx?type=3&id=\\DCAD.ORG\WEB\WEBDATA\WEBFORMS\DATA%20PRODUCTS\DCAD2024_CURRENT.ZIP'
headers = {
"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36'
}
r = requests.get(url, allow_redirects=True, headers=headers, timeout=None)
print(f"URL: {r.url}")
print(f"Status Code: {r.status_code}")
for i,h in enumerate(r.history):
print(f"History[{i}] URL: {h.url}")
print(f"History[{i}] Status: {h.status_code}")
print(f"History[{i}] Headers: {h.headers}")
输出结果:
URL: https://www.dallascad.org/Errors/ErrorPage.aspx?aspxerrorpath=/ViewPDFs.aspx
Status Code: 200
History[0] URL: https://www.dallascad.org/ViewPDFs.aspx?type=3&id=%5CDCAD.ORG%5CWEB%5CWEBDATA%5CWEBFORMS%5CDATA%20PRODUCTS%5CDCAD2024_CURRENT.ZIP
History[0] Status: 302
History[0] Headers: {'Cache-Control': 'private', 'Content-Type': 'text/html; charset=utf-8', 'Location': '/Errors/ErrorPage.aspx?aspxerrorpath=/ViewPDFs.aspx', 'Server': 'Microsoft-IIS/8.5', 'Content-Disposition': 'attachment;filename=DCAD2024_CURRENT.ZIP', 'X-AspNet-Version': '4.0.30319', 'X-Powered-By': 'ASP.NET', 'Date': 'Tue, 26 Mar 2024 14:35:36 GMT', 'Content-Length': '168'}
1 个回答
1
这个id参数里面有很多反斜杠,所以你需要把网址变成一个原始字符串。
这个网站不需要任何头部信息。
所以:
import requests
url = r"https://www.dallascad.org/ViewPDFs.aspx?type=3&id=\\DCAD.ORG\WEB\WEBDATA\WEBFORMS\DATA%20PRODUCTS\DCAD2024_CURRENT.ZIP"
with requests.get(url, stream=True) as response:
response.raise_for_status()
with open("DCAD2024_CURRENT.ZIP", "wb") as output:
for chunk in response.iter_content(4096):
output.write(chunk)