Python ISO-8859-1 编码

4 投票

1 回答

6283 浏览

提问于 2025-04-17 06:13

我在用Python处理ISO-8859-1 / Latin-1字符集时遇到了一个大问题。

当我使用os.listdir来获取文件夹里的内容时，得到的字符串是用ISO-8859-1编码的（比如说：''Ol\xe1 Mundo''），但是在Python解释器里，同样的字符串却用不同的编码方式显示：

In : 'Olá Mundo'.decode('latin-1')
Out: u'Ol\xa0 Mundo'

我该怎么做才能让Python把这个字符串解码成相同的格式呢？我发现os.listdir返回的字符串编码是正确的，但解释器却不对（'á'这个字符在ISO-8859-1中对应的是'\xe1'，而不是'\xa0'）：

http://en.wikipedia.org/wiki/ISO/IEC_8859-1

有没有什么想法可以解决这个问题呢？

文本处理字符集编码问题 latin-1 字符串解码编码 iso-8859-1 计算机编码

1 个回答

当你在Python 2的交互式环境中输入一个非Unicode字符串时，系统会默认使用你的电脑设置的编码方式。

看起来你是在使用Windows系统，因此默认的编码方式可能是“cp850”或者“cp437”：

C:\>python
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdin.encoding
'cp850'
>>> 'Olá Mundo'
'Ol\xa0 Mundo'
>>> u'Olá Mundo'.encode('cp850')
'Ol\xa0 Mundo'

如果你把代码页改成1252（这个大致相当于latin1），那么字符串就会正常显示了：

C:\>chcp 1252
Active code page: 1252

C:\>python
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdin.encoding
'cp1252'
>>> 'Olá Mundo'
'Ol\xe1 Mundo'

回答于 2025-04-17 由 Python大师

分享举报

Python ISO-8859-1 编码

1 个回答

撰写回答