使用Python NTLM浏览受NTLM保护的网站

4 投票
1 回答
4021 浏览
提问于 2025-04-16 20:00

我被要求写一个脚本,这个脚本需要登录到一个公司门户网站,进入特定的页面,下载这个页面,然后把它和之前的版本进行比较,根据变化给某个人发邮件。后面的步骤相对简单,但第一步让我遇到了很多麻烦。

我尝试用urllib2(我想用Python来做这件事)连接,但失败了,花了大约4到5个小时在网上查资料,最后我发现我无法连接的原因是网页使用了NTLM认证。我试了很多不同的连接方法,都是无济于事。根据这个NTLM示例,我做了以下尝试:

import urllib2
from ntlm import HTTPNtlmAuthHandler

user = 'username'
password = "password"
url = "https://portal.whatever.com/"

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
# create the NTLM authentication handler
auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman)

# create and install the opener
opener = urllib2.build_opener(auth_NTLM)
urllib2.install_opener(opener)

# create a header
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
header = { 'Connection' : 'Keep-alive', 'User-Agent' : user_agent}

response = urllib2.urlopen(urllib2.Request(url, None, header))

当我运行这个(用真实的用户名、密码和网址)时,我得到了以下结果:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "ntlm2.py", line 21, in <module>
    response = urllib2.urlopen(urllib2.Request(url, None, header))
  File "C:\Python27\lib\urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 400, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 432, in error
    result = self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 619, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python27\lib\urllib2.py", line 400, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 432, in error
    result = self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 619, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python27\lib\urllib2.py", line 400, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 438, in error
     return self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
     result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 521, in http_error_default
     raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
  urllib2.HTTPError: HTTP Error 401: Unauthorized

让我觉得最有趣的是,最后一行显示返回了一个401错误。根据我所阅读的内容,401错误是NTLM认证开始时返回给客户端的第一个消息。我原以为python-ntlm的目的是为了帮我处理NTLM的过程。这个理解是错的吗,还是我使用的方法不对?另外,我并不局限于使用Python,如果用其他语言有更简单的方法,请告诉我(根据我查的资料,似乎没有)。谢谢!

1 个回答

1

如果网站使用的是NTLM认证,那么返回的HTTP错误信息中的头部属性应该会显示这一点:

>>> try:
...   handle = urllib2.urlopen(req)
... except IOError, e:
...   print e.headers
... 
<other headers>
WWW-Authenticate: Negotiate
WWW-Authenticate: NTLM

撰写回答