URLLib2.URL错误：读取服务器响应代码（Python）

2 投票

3 回答

2135 浏览

提问于 2025-04-18 11:06

我有一份网址列表。我想查看每个网址的服务器响应代码，看看有没有坏掉的链接。我能识别服务器错误（500）和坏链接（404），但是一旦遇到不是网站的地址（比如“notawebsite_broken.com”），代码就出错了。我查了很多地方，但没有找到解决办法……希望你们能帮帮我。

这是我的代码：

import urllib2

#List of URLs. The third URL is not a website
urls = ["http://www.google.com","http://www.ebay.com/broken-link",
"http://notawebsite_broken"]

#Empty list to store the output
response_codes = []

# Run "for" loop: get server response code and save results to response_codes
for url in urls:
    try:
        connection = urllib2.urlopen(url)
        response_codes.append(connection.getcode())
        connection.close()
        print url, ' - ', connection.getcode()
    except urllib2.HTTPError, e:
        response_codes.append(e.getcode())
        print url, ' - ', e.getcode()

print response_codes

这段代码的输出是……

http://www.google.com  -  200
http://www.ebay.com/broken-link  -  404
Traceback (most recent call last):
  File "test.py", line 12, in <module>
    connection = urllib2.urlopen(url)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1214, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1184, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known>

有没有人知道怎么解决这个问题，或者能给我指个方向？

3 个回答

当 urllib2.urlopen() 无法连接到服务器，或者无法找到主机的 IP 地址时，它会抛出一个 URLError，而不是 HTTPError。你需要同时处理 urllib2.URLError 和 urllib2.HTTPError，以应对这些情况。

回答于 2025-04-18 由 Python大师

分享举报

urllib2库的接口真让人头疼。

很多人，包括我自己，都强烈推荐使用requests这个包：

http://docs.python-requests.org/en/latest/

requests的一个好处是，所有请求相关的问题都来自一个基础的异常类。当你直接使用urllib2时，会出现很多不同的异常，不仅仅是urllib2自己，还有socket模块，可能还有其他的（我记不清了，反正很乱）。

总之——直接使用requests库就好了。

回答于 2025-04-18 由 Python大师

分享举报

你可以使用requests库：

import requests

urls = ["http://www.google.com","http://www.ebay.com/broken-link",
"http://notawebsite_broken"]

for u in urls:
    try:
        r = requests.get(u)
        print "{} {}".format(u,r.status_code)
    except Exception,e:
        print "{} {}".format(u,e)

http://www.google.com 200
http://www.ebay.com/broken-link 404
http://notawebsite_broken HTTPConnectionPool(host='notawebsite_broken', port=80): Max retries exceeded with url: /

回答于 2025-04-18 由 Python大师

分享举报

URLLib2.URL错误：读取服务器响应代码（Python）

3 个回答

撰写回答