使用urllib2打开波斯网址域名
我正在尝试用 urllib2.urlopen 打开一个网址:http://الاعلي-للاتصالات.قطر/ar/news-events/event/future-internet-privacy,但总是报错。
类似的情况也发生在 http://الاعلي-للاتصالات.قطر/ar 这个网址上……其他一些页面(比如中文的)可以正常打开。
有没有什么建议可以帮我找到打开这些网址的正确方法?
urllib2.urlopen("http://الاعلي-للاتصالات.قطر/ar/news-events/event/future-internet-privacy").read()
urllib2.urlopen('http://الاعلي-للاتصالات.قطر').read()
[编辑]
错误信息是:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib/python2.6/urllib2.py", line 1170, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.6/urllib2.py", line 1142, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "/usr/lib/python2.6/httplib.py", line 914, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.6/httplib.py", line 951, in _send_request
self.endheaders()
File "/usr/lib/python2.6/httplib.py", line 908, in endheaders
self._send_output()
File "/usr/lib/python2.6/httplib.py", line 780, in _send_output
self.send(msg)
File "/usr/lib/python2.6/httplib.py", line 759, in send
self.sock.sendall(str)
我也试过用 u'http://الاعلي-للاتصالات.قطر'.encode('utf-8'),但是这个网址也打不开。
1 个回答
8
正如@Donal所说,网址需要用一种叫做Punycode的方式进行编码。幸运的是,Python已经包含了这个功能。下面是一段示例的Python代码:
domain = "الاعلي-للاتصالات.قطر"
domain_unicode = unicode(domain, "utf8")
domain_idna = domain_unicode.encode("idna")
urllib2.urlopen("http://" + domain_idna).read()
希望这对你有帮助。