如何在Python 3中处理urllib的超时？

34 投票

3 回答

94779 浏览

提问于 2025-04-17 09:42

首先，我遇到的问题和这个问题很相似。我希望在使用urllib.urlopen()时，如果超时能够产生一个我可以处理的异常。

这难道不算是URLError的一种吗？

try:
    response = urllib.request.urlopen(url, timeout=10).read().decode('utf-8')
except (HTTPError, URLError) as error:
    logging.error(
        'Data of %s not retrieved because %s\nURL: %s', name, error, url)
else:
    logging.info('Access successful.')

错误信息：

resp = urllib.request.urlopen(req, timeout=10).read().decode('utf-8')
文件 "/usr/lib/python3.2/urllib/request.py"，第 138 行，在 urlopen
return opener.open(url, data, timeout)
文件 "/usr/lib/python3.2/urllib/request.py"，第 369 行，在 open
response = self._open(req, data)
文件 "/usr/lib/python3.2/urllib/request.py"，第 387 行，在 _open
'_open', req)
文件 "/usr/lib/python3.2/urllib/request.py"，第 347 行，在 _call_chain
result = func(*args)
文件 "/usr/lib/python3.2/urllib/request.py"，第 1156 行，在 http_open
return self.do_open(http.client.HTTPConnection, req)
文件 "/usr/lib/python3.2/urllib/request.py"，第 1141 行，在 do_open
r = h.getresponse()
文件 "/usr/lib/python3.2/http/client.py"，第 1046 行，在 getresponse
response.begin()
文件 "/usr/lib/python3.2/http/client.py"，第 346 行，在 begin
version, status, reason = self._read_status()
文件 "/usr/lib/python3.2/http/client.py"，第 308 行，在 _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
文件 "/usr/lib/python3.2/socket.py"，第 276 行，在 readinto
return self._sock.recv_into(b)
socket.timeout: 超时

在Python 3中有一个重大变化，他们把urllib和urllib2模块合并成了urllib。这是否可能是导致这个问题的原因呢？

httpclient timeout urllib urlopen socket exception handling request module urlexception

3 个回答

什么是“超时”？简单来说，就是“服务器没有及时响应，通常是因为负载过高，值得再试一次。”

HTTP状态码504“网关超时”就是这种情况下的超时。它是通过HTTPError来传递的。

HTTP状态码429“请求过多”在这个定义下也算是超时。它同样是通过HTTPError来传递的。

那么，超时到底还包括什么呢？我们是否也把通过DNS解析器解析域名时的超时算在内？发送数据时的超时？等待数据返回时的超时？

我不知道怎么检查urllib的源代码，以确保我认为的每一种超时情况都能被捕捉到。在没有检查异常的语言中，我不知道该怎么做。我有一种感觉，也许连接到DNS的错误会以socket.timeout的形式返回，而连接到远程服务器的错误可能会以URLError(socket.timeout)的形式返回？这只是一个猜测，可能解释了之前的观察。

所以我采取了一些非常谨慎的编码方式。(1) 我处理了一些表示超时的HTTP状态码。(2) 有报告说，有些超时是通过socket.timeout异常返回的，有些则是通过URLError(socket.timeout)异常返回的，所以我都捕捉这两种情况。(3) 另外，我还加上了HTTPError(socket.timeout)以防万一。

while True:
    reason : Optional[str] = None
    try:
        with urllib.request.urlopen(url) as response:
            content = response.read()
            with open(cache,"wb") as file:
                file.write(content)
            return content
    except urllib.error.HTTPError as e:
        if e.code == 429 or e.code == 504: # 429=too many requests, 504=gateway timeout
            reason = f'{e.code} {str(e.reason)}'
        elif isinstance(e.reason, socket.timeout):
            reason = f'HTTPError socket.timeout {e.reason} - {e}'
        else:
            raise
    except urllib.error.URLError as e:
        if isinstance(e.reason, socket.timeout):
            reason = f'URLError socket.timeout {e.reason} - {e}'
        else:
            raise
    except socket.timeout as e:
        reason = f'socket.timeout {e}'
    except:
        raise
    netloc = urllib.parse.urlsplit(url).netloc # e.g. nominatim.openstreetmap.org
    print(f'*** {netloc} {reason}; will retry', file=sys.stderr)
    time.sleep(5)

回答于 2025-04-17 由 Python大师

分享举报

之前的回答没有正确处理超时错误。超时错误会以 URLError 的形式出现，所以如果我们想专门捕捉这些错误，就需要这样写：

from urllib.error import HTTPError, URLError
import socket

try:
    response = urllib.request.urlopen(url, timeout=10).read().decode('utf-8')
except HTTPError as error:
    logging.error('Data not retrieved because %s\nURL: %s', error, url)
except URLError as error:
    if isinstance(error.reason, socket.timeout):
        logging.error('socket timed out - URL %s', url)
    else:
        logging.error('some other error happened)
else:
    logging.info('Access successful.')

需要注意的是，ValueError 也可能单独出现，比如当网址无效的时候。它和 HTTPError 一样，并不是和超时有关的错误。

回答于 2025-04-17 由 Python大师

分享举报

用明确的方式捕捉不同的异常，并通过URLError检查异常的原因（感谢Régis B.和Daniel Andrzejewski）

from socket import timeout
from urllib.error import HTTPError, URLError

try:
    response = urllib.request.urlopen(url, timeout=10).read().decode('utf-8')
except HTTPError as error:
    logging.error('HTTP Error: Data of %s not retrieved because %s\nURL: %s', name, error, url)
except URLError as error:
    if isinstance(error.reason, timeout):
        logging.error('Timeout Error: Data of %s not retrieved because %s\nURL: %s', name, error, url)
    else:
        logging.error('URL Error: Data of %s not retrieved because %s\nURL: %s', name, error, url)
else:
    logging.info('Access successful.')

注意：对于最近的评论，原帖提到的是Python 3.2版本，在这个版本中，你需要用socket.timeout来明确捕捉超时错误。例如



    # Warning - python 3.2 code
    from socket import timeout
    
    try:
        response = urllib.request.urlopen(url, timeout=10).read().decode('utf-8')
    except timeout:
        logging.error('socket timed out - URL %s', url)

回答于 2025-04-17 由 Python大师

分享举报

如何在Python 3中处理urllib的超时？

3 个回答

撰写回答