Python requests 会话在读取大于50MB的响应内容后未能读取响应
在使用Python的requests库访问一些REST API时,我使用了请求的会话对象。遇到了一个问题,当第一次请求读取大量内容(超过50MB)时,后续在同一个会话对象上的HTTP请求就失败了。但是如果我不使用会话对象,所有请求都能正常工作……我在下面解释了代码……
import requests # version 2.3.0 # python version 2.7
headers = {"Authorization":"Bearer sometoken"}
sess = requests.Session()
sess.verify = False
host = "https://somehost/endpoint/"
res = sess.get(url = host+'obj1/28/content', headers = headers)
print res # this result received successfully with 200 response status code
url = host + 'obj2/1/content'
res = sess.get(url = url, headers=headers) # the process running here continuously running here. I need to kill the process to exit.
print "content ", res.content # this line never gets executed...
在结束进程后,堆栈跟踪……
File "/opt/lib/python2.7/site-packages/requests/sessions.py", line 556, in send
r = adapter.send(request, **kwargs)
File "/opt/lib/python2.7/site-packages/requests/adapters.py", line 391, in send
r.content
File "/opt/lib/python2.7/site-packages/requests/models.py", line 690, in content
self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
File "/opt/lib/python2.7/site-packages/requests/models.py", line 628, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File "/opt/lib/python2.7/site-packages/requests/packages/urllib3/response.py", line 240, in stream
data = self.read(amt=amt, decode_content=decode_content)
File "/opt/lib/python2.7/site-packages/requests/packages/urllib3/response.py", line 187, in read
data = self._fp.read(amt)
File "/opt/lib/python2.7/httplib.py", line 567, in read
s = self.fp.read(amt)
File "/opt/lib/python2.7/httplib.py", line 1313, in read
return s + self._file.read(amt - len(s))
File "/opt/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
File "/opt/lib/python2.7/ssl.py", line 242, in recv
return self.read(buflen)
File "/opt/lib/python2.7/ssl.py", line 161, in read
return self._sslobj.read(len)
但是不使用会话对象的情况下,相同的HTTP请求都能正常工作。
print requests.get( host+'obj1/28/content', headers = headers, verify = False)
print requests.get( host+'obj2/1/content', headers = headers, verify = False)
1 个回答
2
来自requests
文档的内容:
好消息——多亏了urllib3,在一个会话中,保持连接是完全自动的!你在会话中发出的任何请求都会自动重用合适的连接!
需要注意的是,连接只有在所有数据都被读取后才会被释放回连接池以供重用;确保要么将流设置为False,要么读取响应对象的内容属性。
听起来像是大请求占用了那个连接,或者正如abarnert所说,服务器可能有问题。试着将stream=False
设置,或者访问第一个res
对象的内容,这样requests
就知道可以释放那个连接了。
编辑:看起来这是个问题。当你调用requests.get
时,你明确设置了verify = False
。这其实是多余的,因为requests.get
的默认值就是False
。
不过,你的卡顿出现在adapter.send(request, **kwargs)
。所以看起来是HTTPAdapter
对象出了问题。adapter.send
的函数签名是:
send(request, stream=False, timeout=None, verify=True, cert=None, proxies=None)
默认情况下verify=True
。
这听起来像是requests
中的一个bug,但我猜verify
参数没有从Session
传递下去。sess.request
的函数签名是:
request(method, url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=None, allow_redirects=True, proxies=None, hooks=None, stream=None, verify=None, cert=None)
这里verify=None
而不是False
,所以这可能意味着它在某个地方被覆盖了。
试着在sess.get
中明确设置verify=False
。