Python中urllib2的HTTP基本认证似乎无法工作

9 投票
3 回答
15886 浏览
提问于 2025-04-16 12:36

我正在尝试使用urllib2下载一个需要基本认证的网页。我用的是Python 2.7,但在另一台电脑上用Python 2.5也遇到了同样的问题。我尽量按照这个指南中的例子来写代码,下面是我写的代码:

import urllib2

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, "http://authenticationsite.com/', "protected", "password")
authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)

f = opener.open("http://authenticationsite.com/content.html")
print f.read()
f.close()

不幸的是,那个服务器不是我的,所以我不能分享具体的细节;我在上面和下面都把它们换掉了。当我运行代码时,出现了以下错误信息:

  File
"/usr/lib/python2.7/urllib2.py", line
397, in open
response = meth(req, response)   File "/usr/lib/python2.7/urllib2.py",
line 510, in http_response
'http', request, response, code, msg, hdrs)   File
"/usr/lib/python2.7/urllib2.py", line
435, in error
return self._call_chain(*args)   File "/usr/lib/python2.7/urllib2.py",
line 369, in _call_chain
result = func(*args)   File "/usr/lib/python2.7/urllib2.py", line
518, in http_error_default
raise HTTPError(req.get_full_url(), code,
msg, hdrs, fp) urllib2.HTTPError: HTTP
Error 401: Authorization Required

有趣的是,当我用ngrep监控电脑上的tcp流量时:

ngrep host 74.125.224.49 interface:
wlan0 (192.168.1.0/255.255.255.0)
filter: (ip) and ( host 74.125.224.49
)
#### T 192.168.1.74:34366 -74.125.224.49:80 [AP]   GET /content.html
HTTP/1.1..Accept-Encoding:
identity..Host:
authenticationsite.com..Connection:
close..User-Agent:
Python-urllib/2.7.... 

## T 74.125.224.49:80 -192.168.1.74:34366 [AP]   HTTP/1.1 401 Authorization Required..Date: Sun, 27
Feb 2011 03:39:31 GMT..Server:
Apache/2.2.3 (Red
Hat)..WWW-Authenticate: Digest
realm="protected",
nonce="6NSgTzudBAA=ac585d1f7ae0632c4b90324aff5e39e0f1fc25
05", algorithm=MD5,
qop="auth"..Content-Length:
486..Connection: close..Content-Type: text/html;
charset=iso-8859-1....<!DOCTYPE HTML
PUBLIC "-//IETF//DTD HTML
2.0//EN">.<html><head>.<title>401 Authorization   
Required</title>.</head><body>.<h1>Authorization
Required</h1>.<p>This server could not
verify that you.are authorized to
access the document.requested.  Either
you supplied the wrong.credentials
(e.g., badpassword), or
your.browser doesn't understand how to
supply.the credentials
required.</p>.<hr>.<address>Apache/2.2.3
(Red Hat) Server at
authenticationsite.com Port
80</address>.</body></html>.  

####

看起来urllib2在收到初始的401错误后,根本没有尝试提供认证信息就抛出了那个异常。

为了比较,这里是我在网页浏览器中进行认证时ngrep的输出:

ngrep host 74.125.224.49 interface:
wlan0 (192.168.1.0/255.255.255.0)
filter: (ip) and ( host 74.125.224.49
)
#### T 192.168.1.74:36102 -74.125.224.49:80 [AP]   GET /content.html HTTP/1.1..Host:
authenticationsite.com..User-Agent:
Mozilla/5.0 (X11; U; Linux i686;
en-US; rv:1.9.2.12) Gecko/20101027
Firefox/3.6.12..Accept: text  
/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8..Accept-Language:
en-us,en;q=0.5..Accept-Encoding:
gzip,deflate..Accept-Charset:
ISO-8859-1,utf-8;q=0.7,*;q=0.7..Keep-Alive:
115..Connection: keep-   alive....  
## T 74.125.224.49:80 -192.168.1.74:36102 [AP]   HTTP/1.1 401 Authorization Required..Date: Sun, 27
Feb 2011 03:43:42 GMT..Server:
Apache/2.2.3 (Red
Hat)..WWW-Authenticate: Digest
realm="protected",
nonce="rKCfXjudBAA=0c1111321169e30f689520321dbcce37a1876b
be", algorithm=MD5,
qop="auth"..Content-Length:
486..Connection: close..Content-Type: text/html;
charset=iso-8859-1....<!DOCTYPE HTML
PUBLIC "-//IETF//DTD HTML
2.0//EN">.<html><head>.<title>401 Authorization   
Required</title>.</head><body>.<h1>Authorization
Required</h1>.<p>This server could not
verify that you.are authorized to
access the document.requested.  Either
you supplied the wrong.credentials
(e.g., badpassword), or
your.browser doesn't understand how to
supply.the credentials
required.</p>.<hr>.<address>Apache/2.2.3
(Red Hat) Server at
authenticationsite.com Port
80</address>.</body></html>.  

######### T 192.168.1.74:36103 -74.125.224.49:80 [AP]   GET /content.html HTTP/1.1..Host:
authenticationsite.com..User-Agent:
Mozilla/5.0 (X11; U; Linux i686;
en-US; rv:1.9.2.12) Gecko/20101027
Firefox/3.6.12..Accept: text  
/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8..Accept-Language:
en-us,en;q=0.5..Accept-Encoding:
gzip,deflate..Accept-Charset:
ISO-8859-1,utf-8;q=0.7,*;q=0.7..Keep-Alive:
115..Connection: keep-   alive..Authorization: Digest
username="protected",
realm="protected",
nonce="rKCfXjudBAA=0c1111199162342689520550dbcce37a1876bbe",
uri="/content.html", algorithm=   MD5,
response="3b65dadaa00e1d6a1892ffff49f9f325",
qop=auth, nc=00000001,
cnonce="7636125b7fde3d1b".... 

##

然后后面就是网站的内容。

我已经尝试了很久,但还是搞不清楚我哪里出错了。如果有人能帮帮我,我会非常感激!

3 个回答

-1
import urllib2
# Create an OpenerDirector with support for Basic HTTP Authentication...
auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password(realm='PDQ Application',
                          uri='https://mahler:8092/site-updates.py',
                          user='klem',
                          passwd='kadidd!ehopper')
opener = urllib2.build_opener(auth_handler)
# ...and install it globally so it can be used with urlopen.
urllib2.install_opener(opener)
urllib2.urlopen('http://www.example.com/login.html')

-- http://docs.python.org/library/urllib2.html#examples

0

你需要使用 Python 的 NTLM 模块来实现这个功能:

首先,导入需要的库:

从 ntlm 导入 HTTPNtlmAuthHandler

导入 urllib2

接下来,设置你的用户名和密码:

用户 = "你的用户名"

密码 = "你的密码"

然后,创建一个密码管理器:

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()

在密码管理器中添加你的用户名和密码:

passman.add_password(None, "http://你的主页地址/", 用户, 密码)

接下来,创建一个 NTLM 认证处理器:

auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman)

然后,构建一个打开器:

opener = urllib2.build_opener(auth_NTLM)

安装这个打开器:

urllib2.install_opener(opener)

设置你想要访问的 URL:

url = "http://你的主页地址/子地址"

发送请求并获取响应:

response = urllib2.urlopen(url)

获取响应的头信息:

headers = response.info()

打印头信息:

print("headers: {}".format(headers))

读取响应的内容:

body = response.read()

打印响应内容:

print("response: " + body)

9

我觉得这是由这个引起的:

WWW-Authenticate: Digest

看起来这个资源是用摘要认证,而不是基本认证。这意味着你应该使用 urllib2.HTTPDigestAuthHandler 来处理。

代码可能是这样的:

import urllib2

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, "http://authenticationsite.com/", "protected", "password")

# use HTTPDigestAuthHandler instead here
authhandler = urllib2.HTTPDigestAuthHandler(passman)
opener = urllib2.build_opener(authhandler)

res = opener.open("http://authenticationsite.com/content.html")
print res.read()
res.close()

撰写回答