UnicodeEncodeError:“charmap”编解码器无法对位置0中的字符“\x80”进行编码：字符映射到<undefined>

2条回答

网友

1楼 · 编辑于 2024-05-23 15:30:47

'\x80'.decode('cp1252')不给出u'\u0080'（这与u'\x80'是同一回事）。

Windows code page 1252中的字节0x80解码为Unicode字符€U+20AC欧洲符号。

有一种编码方式，所有字节0x00到0xFF解码为具有相同数字U+0000到U+00FF的Unicode字符：它是^{}。通过这种编码，您的示例可以工作。

Windows cp1252与该编码类似，但不相同：虽然0xA0到0xFF与iso-8859-1中的相同，因此您可以获得这些字符的直接映射行为，但字节0x80到0x9F是来自其他Unicode块的额外符号的组合，而不是不可见（基本上无用）的控制代码U+0080到U+009F

网友

2楼 · 编辑于 2024-05-23 15:30:47

str.decode不只是在字符串文本前面加上u。它将输入字符串的字节转换为有意义的字符（即Unicode）。

然后调用encode将这些字符转换为字节，因为您需要“打印”，将它们输出到终端或任何其他操作系统实体（如GUI窗口）。

所以，关于你的具体任务，我相信你想要的是：

s = '\x80'
print s.decode('cp1251').encode(platform_encoding)

其中'cp1251'是IDE的编码，而platform_encoding是当前系统的编码变量。

在回复您的评论时：

But the str.decode should have used the source code encoding (from line 2 in the file) to decode. So there should not be a difference to the u

The encoding information is then used by the Python parser to interpret the file using the given encoding.

所以set fileencoding=cp1252只是告诉解释器在解析第str = '\x80'行时如何将[通过编辑器输入的]字符转换为字节。在str.decode调用期间不使用此信息。

你也在问，u'\x80'是什么？\x80简单地解释为\u0080，这显然不是您想要的。看看这个问题-Bytes in a unicode Python string。