urllib2.Request 检查 URL 是否可达
我有一段代码用来检查某个网址是否正确,我只需要得到200的响应,所以我写了一个脚本,运行得还不错,但速度太慢了(:
import urllib2
import string
def my_range(start, end, step):
while start <= end:
yield start
start += step
url = 'http://exemple.com/test/'
y = 1
for x in my_range(1, 5, 1):
y =y+1
url+=str(y)
print url
req = urllib2.Request(url)
try:
resp = urllib2.urlopen(req)
except urllib2.URLError, e:
if e.code == 404:
print "404"
else:
print "not 404"
else:
print "200"
url = 'http://exemple.com/test/'
body = resp.read()
在这个例子中,我假设我的本地服务器上有以下目录,并且得到了这些结果
http://exemple.com/test/2
200
http://exemple.com/test/3
200
http://exemple.com/test/4
404
http://exemple.com/test/5
404
http://exemple.com/test/6
404
所以我去查找怎么能更快一点,找到了这段代码:
import urllib2
request = urllib2.Request('http://www.google.com/')
response = urllib2.urlopen(request)
if response.getcode() == 200:
print "200"
看起来速度快了一些,但当我用一个404的链接测试时(http://www.google.com/111), 它给我的结果是:
Traceback (most recent call last):
File "C:\Python27\res.py", line 3, in <module>
response = urllib2.urlopen(request)
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 400, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 438, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 521, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found
大家有什么想法吗?非常感谢任何帮助 :)
1 个回答
4
HTTPError
是一类错误的集合,这样你就可以在遇到这种情况时使用尝试/捕获(Try/Except)来处理它们:
import urllib2
request = urllib2.Request('http://www.google.com/')
try:
response = urllib.urlopen(request)
# do stuff..
except urllib2.HTTPError: # 404, 500, etc..
pass
你还可以添加一个额外的 except
条件来处理 urllib2.URLError
,这个错误包括其他一些(非HTTP)的问题,比如超时。