<p>看起来名称解析最终由<code>socket.create_connection</code>处理。</p>
<pre><code>-> urllib2.urlopen
-> httplib.HTTPConnection
-> socket.create_connection
</code></pre>
<p>不过,一旦设置了“Host:”头,就可以解析主机并将IP地址向下传递到打开程序。</p>
<p>我建议您对<code>httplib.HTTPConnection</code>进行子类划分,并在将其传递给<code>socket.create_connection</code>之前包装<code>connect</code>方法以修改<code>self.host</code>。</p>
<p>然后子类<code>HTTPHandler</code>(和<code>HTTPSHandler</code>)将<code>http_open</code>方法替换为将<code>HTTPConnection</code>而不是将httplib自己的方法传递给<code>do_open</code>。</p>
<p>像这样:</p>
<pre><code>import urllib2
import httplib
import socket
def MyResolver(host):
if host == 'news.bbc.co.uk':
return '66.102.9.104' # Google IP
else:
return host
class MyHTTPConnection(httplib.HTTPConnection):
def connect(self):
self.sock = socket.create_connection((MyResolver(self.host),self.port),self.timeout)
class MyHTTPSConnection(httplib.HTTPSConnection):
def connect(self):
sock = socket.create_connection((MyResolver(self.host), self.port), self.timeout)
self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
class MyHTTPHandler(urllib2.HTTPHandler):
def http_open(self,req):
return self.do_open(MyHTTPConnection,req)
class MyHTTPSHandler(urllib2.HTTPSHandler):
def https_open(self,req):
return self.do_open(MyHTTPSConnection,req)
opener = urllib2.build_opener(MyHTTPHandler,MyHTTPSHandler)
urllib2.install_opener(opener)
f = urllib2.urlopen('http://news.bbc.co.uk')
data = f.read()
from lxml import etree
doc = etree.HTML(data)
>>> print doc.xpath('//title/text()')
['Google']
</code></pre>
<p>显然,如果您使用HTTPS,会有证书问题,您需要填写MyResolver。。。</p>