Python请求挂起/冻结

2024-06-07 08:27:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用请求库从某个地方获取许多网页。他是相关的密码:

response = requests.Session()
retries = Retry(total=5, backoff_factor=.1)
response.mount('http://', HTTPAdapter(max_retries=retries))
response = response.get(url)

一段时间后,它只是挂起/冻结(从来没有在同一个网页上)而得到的网页。这是我打断它时的回溯:

File "/Users/Student/Hockey/Scrape/html_pbp.py", line 21, in get_pbp
  response = r.read().decode('utf-8')
File "/anaconda/lib/python3.6/http/client.py", line 456, in read
  return self._readall_chunked()
File "/anaconda/lib/python3.6/http/client.py", line 566, in _readall_chunked
  value.append(self._safe_read(chunk_left))
File "/anaconda/lib/python3.6/http/client.py", line 612, in _safe_read
  chunk = self.fp.read(min(amt, MAXAMOUNT))
File "/anaconda/lib/python3.6/socket.py", line 586, in readinto
  return self._sock.recv_into(b)
keyboardInterrupt

有人知道是什么引起的吗?或者(更重要的是)有没有人知道一种方法来阻止它,如果它需要超过一定的时间,我可以再试一次?


Tags: inpyselfclienthttp网页readget
2条回答

似乎设置一个(read)timeout可以帮助您。

大致如下:

response = response.get(url, timeout=5)

(这会将连接和读取超时设置为5秒。)

requests中,不幸的是,默认情况下既不设置connect也不设置read超时,即使docs说设置它很好:

Most requests to external servers should have a timeout attached, in case the server is not responding in a timely manner. By default, requests do not time out unless a timeout value is set explicitly. Without a timeout, your code may hang for minutes or more.

为了完整起见,连接超时是等待客户端建立到远程计算机的连接的秒数,读取超时是客户端在从服务器发送的字节之间等待的秒数。

要全局设置超时而不是在每个请求中指定,请执行以下操作:


from requests.adapters import TimeoutSauce

REQUESTS_TIMEOUT_SECONDS = float(os.getenv("REQUESTS_TIMEOUT_SECONDS", 5))

class CustomTimeout(TimeoutSauce):
    def __init__(self, *args, **kwargs):
        if kwargs["connect"] is None:
            kwargs["connect"] = REQUESTS_TIMEOUT_SECONDS
        if kwargs["read"] is None:
            kwargs["read"] = REQUESTS_TIMEOUT_SECONDS
        super().__init__(*args, **kwargs)


# Set it globally, instead of specifying ``timeout=..`` kwarg on each call.
requests.adapters.TimeoutSauce = CustomTimeout


sess = requests.Session()
sess.get(...)
sess.post(...)

相关问题 更多 >

    热门问题