Python requests 会话在读取大于50MB的响应内容后未能读取响应

1 投票
1 回答
3653 浏览
提问于 2025-04-18 16:56

在使用Python的requests库访问一些REST API时,我使用了请求的会话对象。遇到了一个问题,当第一次请求读取大量内容(超过50MB)时,后续在同一个会话对象上的HTTP请求就失败了。但是如果我不使用会话对象,所有请求都能正常工作……我在下面解释了代码……

import requests       # version 2.3.0  # python version 2.7

headers = {"Authorization":"Bearer sometoken"}

sess = requests.Session()
sess.verify = False
host = "https://somehost/endpoint/"
res = sess.get(url = host+'obj1/28/content', headers = headers)
print res  # this result received successfully with 200 response status code

url = host + 'obj2/1/content'
res = sess.get(url = url, headers=headers)  # the process running here continuously running     here. I need to kill the process to exit.
print "content ", res.content # this line never gets executed...

在结束进程后,堆栈跟踪……

  File "/opt/lib/python2.7/site-packages/requests/sessions.py", line 556, in send
    r = adapter.send(request, **kwargs)
  File "/opt/lib/python2.7/site-packages/requests/adapters.py", line 391, in send
    r.content
  File "/opt/lib/python2.7/site-packages/requests/models.py", line 690, in content
    self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
  File "/opt/lib/python2.7/site-packages/requests/models.py", line 628, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/opt/lib/python2.7/site-packages/requests/packages/urllib3/response.py", line 240, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/opt/lib/python2.7/site-packages/requests/packages/urllib3/response.py", line 187, in read
    data = self._fp.read(amt)
  File "/opt/lib/python2.7/httplib.py", line 567, in read
    s = self.fp.read(amt)
  File "/opt/lib/python2.7/httplib.py", line 1313, in read
    return s + self._file.read(amt - len(s))
  File "/opt/lib/python2.7/socket.py", line 380, in read
    data = self._sock.recv(left)
  File "/opt/lib/python2.7/ssl.py", line 242, in recv
    return self.read(buflen)
  File "/opt/lib/python2.7/ssl.py", line 161, in read
    return self._sslobj.read(len)

但是不使用会话对象的情况下,相同的HTTP请求都能正常工作。

print requests.get( host+'obj1/28/content', headers = headers, verify = False)
print requests.get( host+'obj2/1/content', headers = headers, verify = False)

1 个回答

2

来自requests文档的内容:

好消息——多亏了urllib3,在一个会话中,保持连接是完全自动的!你在会话中发出的任何请求都会自动重用合适的连接!

需要注意的是,连接只有在所有数据都被读取后才会被释放回连接池以供重用;确保要么将流设置为False,要么读取响应对象的内容属性。

听起来像是大请求占用了那个连接,或者正如abarnert所说,服务器可能有问题。试着将stream=False设置,或者访问第一个res对象的内容,这样requests就知道可以释放那个连接了。

编辑:看起来这是个问题。当你调用requests.get时,你明确设置了verify = False。这其实是多余的,因为requests.get的默认值就是False

不过,你的卡顿出现在adapter.send(request, **kwargs)。所以看起来是HTTPAdapter对象出了问题。adapter.send的函数签名是:

 send(request, stream=False, timeout=None, verify=True, cert=None, proxies=None)

默认情况下verify=True

这听起来像是requests中的一个bug,但我猜verify参数没有从Session传递下去。sess.request的函数签名是:

request(method, url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=None, allow_redirects=True, proxies=None, hooks=None, stream=None, verify=None, cert=None)

这里verify=None而不是False,所以这可能意味着它在某个地方被覆盖了。

试着在sess.get中明确设置verify=False

撰写回答