urllib.error.HTTPError:HTTP错误403:已定义禁止的多个标头

2024-04-27 17:28:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试制作一个python3脚本,它可以遍历共享网站上托管的mod列表,并下载最新的mod。我已经在第一步卡住了,去网站上获取mod版本列表。我试图使用urllib,但它抛出了一个403:禁止的错误

我想这可能是由于服务器的某种反刮擦拒绝,我读到您可以通过defining the headers绕过它。我在使用浏览器时运行wireshark,能够识别它发送的标题:

Host: ocsp.pki.goog\r\n
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:85.0) Gecko/20100101 Firefox/85.0\r\n
Accept: */*\r\n
Accept-Language: en-US,en;q=0.5\r\n
Accept-Encoding: gzip, deflate\r\n
Content-Type: application/ocsp-request\r\n
Content-Length: 83\r\n
Connection: keep-alive\r\n
\r\n

我相信我能够正确地定义标题,但我不得不退出两个条目,因为它们给出了400错误:

from urllib.request import Request, urlopen

count = 0
mods = ['mod1', 'mod2', ...] #this has been created to complete the URL and has been tested to work

#iterate through all mods and download latest version
while mods:
    url = 'https://Domain/'+mods[count]
    #change the header to the browser I was using at the time of writing the script
    req = Request(url)
    #req.add_header('Host', 'ocsp.pki.goog\\r\\n') #this reports 400 bad request
    req.add_header('User-Agent', 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:85.0) Gecko/20100101 Firefox/85.0\\r\\n')
    req.add_header('Accept', '*/*\\r\\n')
    req.add_header('Accept-Language', 'en-US,en;q=0.5\\r\\n')
    req.add_header('Accept-Encoding', 'gzip, deflate\\r\\n')
    req.add_header('Content-Type', 'application/ocsp-request\\r\\n')
    #req.add_header('Content-Length', '83\\r\\n') #this reports 400 bad request
    req.add_header('Connection', 'keep-alive\\r\\n')
    html = urlopen(req).read().decode('utf-8')

这仍然会抛出403:禁止的错误:

Traceback (most recent call last):
  File "SCRIPT.py", line 19, in <module>
    html = urlopen(req).read().decode('utf-8')
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

我不确定我做错了什么。我假设我定义标题值的方式有问题,但我不确定它们有什么问题。任何帮助都将不胜感激


Tags: theinpyaddrequestlibusrline