使用Python NTLM浏览受NTLM保护的网站
我被要求写一个脚本,这个脚本需要登录到一个公司门户网站,进入特定的页面,下载这个页面,然后把它和之前的版本进行比较,根据变化给某个人发邮件。后面的步骤相对简单,但第一步让我遇到了很多麻烦。
我尝试用urllib2(我想用Python来做这件事)连接,但失败了,花了大约4到5个小时在网上查资料,最后我发现我无法连接的原因是网页使用了NTLM认证。我试了很多不同的连接方法,都是无济于事。根据这个NTLM示例,我做了以下尝试:
import urllib2
from ntlm import HTTPNtlmAuthHandler
user = 'username'
password = "password"
url = "https://portal.whatever.com/"
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
# create the NTLM authentication handler
auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman)
# create and install the opener
opener = urllib2.build_opener(auth_NTLM)
urllib2.install_opener(opener)
# create a header
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
header = { 'Connection' : 'Keep-alive', 'User-Agent' : user_agent}
response = urllib2.urlopen(urllib2.Request(url, None, header))
当我运行这个(用真实的用户名、密码和网址)时,我得到了以下结果:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "ntlm2.py", line 21, in <module>
response = urllib2.urlopen(urllib2.Request(url, None, header))
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 400, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 432, in error
result = self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 619, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "C:\Python27\lib\urllib2.py", line 400, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 432, in error
result = self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 619, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "C:\Python27\lib\urllib2.py", line 400, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 438, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 521, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 401: Unauthorized
让我觉得最有趣的是,最后一行显示返回了一个401错误。根据我所阅读的内容,401错误是NTLM认证开始时返回给客户端的第一个消息。我原以为python-ntlm的目的是为了帮我处理NTLM的过程。这个理解是错的吗,还是我使用的方法不对?另外,我并不局限于使用Python,如果用其他语言有更简单的方法,请告诉我(根据我查的资料,似乎没有)。谢谢!
1 个回答
1
如果网站使用的是NTLM认证,那么返回的HTTP错误信息中的头部属性应该会显示这一点:
>>> try:
... handle = urllib2.urlopen(req)
... except IOError, e:
... print e.headers
...
<other headers>
WWW-Authenticate: Negotiate
WWW-Authenticate: NTLM