在网站从http改为https后,我在抓取网站时遇到了麻烦,不知道如何解决这个问题。我试图从中获取的网站是https://www.boldsystems.org。两天前还是{a2},我的刮板工作得很好
示例代码:
import requests
requests.get('https://www.boldsystems.org')
我返回的错误代码:
Traceback (most recent call last):
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\contrib\pyopenssl.py", line 488, in wrap_socket
cnx.do_handshake()
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\OpenSSL\SSL.py", line 1934, in do_handshake
self._raise_ssl_error(self._ssl, result)
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\OpenSSL\SSL.py", line 1671, in _raise_ssl_error
_raise_current_error()
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\OpenSSL\_util.py", line 54, in exception_from_error_queue
raise exception_type(errors)
OpenSSL.SSL.Error: [('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 677, in urlopen
chunked=chunked,
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 381, in _make_request
self._validate_conn(conn)
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 976, in _validate_conn
conn.connect()
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connection.py", line 370, in connect
ssl_context=context,
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\util\ssl_.py", line 377, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\contrib\pyopenssl.py", line 494, in wrap_socket
raise ssl.SSLError("bad handshake: %r" % e)
ssl.SSLError: ("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])",)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\adapters.py", line 449, in send
timeout=timeout
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 725, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\util\retry.py", line 439, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='boldsystems.org', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_
certificate', 'certificate verify failed')])")))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='boldsystems.org', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_cert
ificate', 'certificate verify failed')])")))
我发现了一些建议禁用验证的解决方案,如:
requests.get('https://boldsystems.org', verify = False)
但我认为这是一种不好的方法,因为SSL验证是有原因的
我已经更新了Certificate、requests和urllib3。我还尝试将SSL证书保存到一个.pem文件中,并将其交给请求函数,但实际上我不确定这会起什么作用,也没有帮助
我可以在Windows和Ubuntu以及不同的计算机上重现这个问题,所以我认为问题出在我试图请求的网站的某个地方
我真的很感激能为我的问题提供一个解决方案或者解释一下这里发生了什么
我不太清楚为什么会出现这种情况,但我必须手动将证书信息添加到certifi的cacert.pem文件中才能使其正常工作
按照这里给出的步骤:Unable to get local issuer certificate when using requests in python
然后它与一个200:
我在自己的系统(Ubuntu18.04,Python3.6.9)上测试了boldsystems.org,得到了相同的结果。不过,常规浏览器工作正常。SSLLabs的免费ssltest工具报告“此服务器的证书链不完整…”
不完整的证书链只意味着服务器没有在链中发送中间证书。与Python不同,浏览器可能缓存了整个链,因此工作正常
解决方案是向要验证的请求提供一个证书捆绑包,以便它能够评估整个链。难看,但应该有用。您需要下载链中的所有证书,连接它们并将它们呈现给请求。这在https://blogs.gnome.org/danni/2015/11/26/using-an-ssl-intermediate-as-your-ca-cert-with-python-requests/中解释
相关问题 更多 >
编程相关推荐