Python响应解码

4 投票

1 回答

19187 浏览

提问于 2025-04-17 19:21

对于以下使用了 urllib 的代码：

# some request object exists
response = urllib.request.urlopen(request)
html = response.read().decode("utf8")

请问 read() 返回的是什么格式的字符串？我一直在试图从Python的文档中找答案，但文档里根本没有提到这个。为什么会有 decode 呢？这个 decode 是把一个对象转成 utf-8 还是从 utf-8 转过来？它是从什么格式转到什么格式呢？关于 decode 的文档也没有提到这些。难道是Python的文档太糟糕了，还是我不理解一些标准的约定？

我想把那个HTML存储到一个UTF-8的文件里。我是直接写入就可以，还是需要先“编码”成某种格式再写入？

注意：我知道 urllib 已经不推荐使用了，但我现在不能切换到 urllib2

数据存储 urllib 对象转换 utf-8编码编码标准字符串格式响应解码 html存储

1 个回答

问python：

>>> r=urllib.urlopen("http://google.com")
>>> a=r.read()
>>> type(a)
0: <type 'str'>
>>> help(a.decode)
Help on built-in function decode:

decode(...)
    S.decode([encoding[,errors]]) -> object

    Decodes S using the codec registered for encoding. encoding defaults
    to the default encoding. errors may be given to set a different error
    handling scheme. Default is 'strict' meaning that encoding errors raise
    a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
    as well as any other name registered with codecs.register_error that is
    able to handle UnicodeDecodeErrors.

>>> b = a.decode('utf8')
>>> type(b)
1: <type 'unicode'>
>>>

看起来，read() 方法返回的是一个 str（字符串）。而 .decode() 方法是把 UTF-8 格式的内容转换成 Python 内部使用的 Unicode 格式。

回答于 2025-04-17 由 Python大师

分享举报

Python响应解码

1 个回答

撰写回答