使用mechanize和Python 2.6下载HTML时的编码问题

4 投票

3 回答

5357 浏览

提问于 2025-04-16 04:38

browser = mechanize.Browser()
page = browser.open(url)
html = page.get_data()

print html

它显示了一些奇怪的字符。我猜这应该是UTF-8格式的字符串，但Python不知道这个格式，所以无法正确显示。

我该如何把这个字符串转换成像下面这样的Unicode字符串呢

u = u'test'

3 个回答

你需要定义编码方式，像这样：

#!/usr/bin/python
# -*- coding: iso-8859-15 -*-

mechanize 需要这个设置。

想了解更多信息，可以查看这个链接 http://www.python.org/dev/peps/pep-0263/

回答于 2025-04-16 由 Python大师

分享举报

u = html.decode('utf-8')

当然可以！请把你想要翻译的内容发给我，我会帮你用简单易懂的语言解释清楚。

回答于 2025-04-16 由 Python大师

分享举报

这是经过gzip压缩的。

def ungzipResponse(r,b):
    headers = r.info()
    if headers['Content-Encoding']=='gzip':
        import gzip
        gz = gzip.GzipFile(fileobj=r, mode='rb')
        html = gz.read()
        gz.close()
        headers["Content-type"] = "text/html; charset=utf-8"
        r.set_data( html )
        b.set_response(r)

response = browser.open(url)
ungzipResponse(response, browser)
html = response.read()

回答于 2025-04-16 由 Python大师

分享举报

使用mechanize和Python 2.6下载HTML时的编码问题

3 个回答

撰写回答