Python的urllib.request.urlopen在网络中断时的行为
我在使用Python的urllib时遇到了一些问题,特别是当网络连接不稳定的时候:如果第一次调用urllib.request.urlopen时没有网络连接,我就无法获取信息。
> python
>>> import urllib.request
>>> urllib.request.urlopen("http://www.google.com")
<http.client.HTTPResponse object at 0x7f6f54681438>
#Now disable internet connection:
> sudo ip link set enp4s0 down
>>> urllib.request.urlopen("http://www.google.com")
Traceback (most recent call last):
File "/usr/lib/python3.4/urllib/request.py", line 1189, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "/usr/lib/python3.4/http/client.py", line 1090, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.4/http/client.py", line 1128, in _send_request
self.endheaders(body)
File "/usr/lib/python3.4/http/client.py", line 1086, in endheaders
self._send_output(message_body)
File "/usr/lib/python3.4/http/client.py", line 924, in _send_output
self.send(msg)
File "/usr/lib/python3.4/http/client.py", line 859, in send
self.connect()
File "/usr/lib/python3.4/http/client.py", line 836, in connect
self.timeout, self.source_address)
File "/usr/lib/python3.4/socket.py", line 491, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "/usr/lib/python3.4/socket.py", line 530, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.4/urllib/request.py", line 455, in open
response = self._open(req, data)
File "/usr/lib/python3.4/urllib/request.py", line 473, in _open
'_open', req)
File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/usr/lib/python3.4/urllib/request.py", line 1215, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/usr/lib/python3.4/urllib/request.py", line 1192, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno -2] Name or service not known>
#Reenable internet connection:
> sudo ip link set enp4s0 up #and wait a bit
>>> urllib.request.urlopen("http://www.google.com")
<http.client.HTTPResponse object at 0x7f6f5468c898>
到目前为止一切正常。现在做完全相同的事情,但第一次没有调用urlopen:
> python
>>> import urllib.request
# do not call urlopen before internet is down...
#Now disable internet connection:
> sudo ip link set enp4s0 down
>>> urllib.request.urlopen("http://www.google.com")
[exactly the same error message as above]
#Reenable internet connection:
> sudo ip link set enp4s0 up #and wait a bit
#Ensure internet connection is up
> ip link show enp4s0 up
2: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP [...]
>>> urllib.request.urlopen("http://www.google.com")
[exactly the same error message as above]
#What's the problem? The internet connection IS up
#However:
> host www.google.com
www.google.com has address 173.194.69.104
[...]
>>> urllib.request.urlopen("http://173.194.69.104")
<http.client.HTTPResponse object at 0x7f3116a72e48>
所以我想这可能和DNS缓存有关?
最后,关于我的系统的一些信息:
> python --version
Python 3.4.1
> uname -a
Linux charon 3.15.3-1-ARCH #1 SMP PREEMPT Tue Jul 1 07:32:45 CEST 2014 x86_64 GNU/Linux
抱歉格式有点奇怪。我把'正常'(以'>'开头)和Python(以'>>>'开头)的命令搞混了,目的是为了让命令的顺序更清楚(显然是在不同的终端中发生的)。
1 个回答
2
你遇到了一个大家都知道的glibc问题。有人可能会争论这是glibc的错误用法,还是glibc本身有问题。res_init
这个函数不是POSIX标准的一部分,而是源自BSD系统的接口,所以在不同的平台上很难做到完全正确。
目前似乎没有关于这个问题的python错误报告,所以你可能想要提交一个。
作为一种解决方法,你可以使用ctypes
自己调用res_init
,但我现在不太确定具体该怎么做。