从URL获取HTTP响应代码的最佳方法是什么？

Question

我想找个简单的方法，从一个网址获取HTTP响应代码（比如200、404等等）。我不太确定应该用哪个库。

Answer 1

你应该使用urllib2，像这样：

import urllib2
for url in ["http://entrian.com/", "http://entrian.com/does-not-exist/"]:
    try:
        connection = urllib2.urlopen(url)
        print connection.getcode()
        connection.close()
    except urllib2.HTTPError, e:
        print e.getcode()

# Prints:
# 200 [from the try block]
# 404 [from the except block]

Answer 2

这里有一个解决方案，它使用了 httplib 这个库。

import httplib

def get_status_code(host, path="/"):
    """ This function retreives the status code of a website by requesting
        HEAD data from the host. This means that it only requests the headers.
        If the host cannot be reached or something else goes wrong, it returns
        None instead.
    """
    try:
        conn = httplib.HTTPConnection(host)
        conn.request("HEAD", path)
        return conn.getresponse().status
    except StandardError:
        return None


print get_status_code("stackoverflow.com") # prints 200
print get_status_code("stackoverflow.com", "/nonexistant") # prints 404

Answer 3

更新内容使用了很棒的requests库。请注意，我们使用的是HEAD请求，这种请求比完整的GET或POST请求要快一些。

import requests
try:
    r = requests.head("https://stackoverflow.com")
    print(r.status_code)
    # prints the int of the status code*
except requests.ConnectionError:
    print("failed to connect")

*想了解更多，可以访问https://developer.mozilla.org/en-US/docs/Web/HTTP/Status

从URL获取HTTP响应代码的最佳方法是什么？

8 个回答

撰写回答