urllib请求给出404错误,但在浏览器中工作正常

2024-04-26 21:17:37 发布

您现在位置:Python中文网/ 问答频道 /正文

当我尝试这一行时:

import urllib.request

urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg")

我得到以下错误:

Traceback (most recent call last):
  File "scraper.py", line 26, in <module>
    urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg")
  File "/usr/lib/python3.6/urllib/request.py", line 248, in urlretrieve 
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

但是链接在我的浏览器中运行良好?为什么它可以在浏览器中工作,但不能用于请求?它与来自同一站点的其他图片一起工作


Tags: inpyhttpurlresponserequestlibusr
2条回答

尝试更改用户代理。您只需添加一个kwarg:

req = urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg", headers={"User-Agent": "put custom user agent here"})

请求返回

enter image description here

如果您检查开发人员控制台,它是404: enter image description here

因此,您看到的是imgur的自定义404“页面”(这是一个图像)

编辑:

因此urlretrieve在404状态代码上失败。如果要使用请求的内容(即使状态代码为404),可以执行以下操作:

try:
    urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg")
except Exception as e:
    with open("error_photo.jpg", 'wb') as fp:
        fp.write(e.read())

相关问题 更多 >