在读取模式下读取二进制文件python3在Windows上通过，在Linux上失败

import os import tempfile temp_dir = tempfile.mkdtemp() temp_file = os.path.join(temp_dir, 'write_file') expected_bytes = bytearray([123, 3, 255, 0, 100]) with open(temp_file, 'wb') as fh: fh.write(expected_bytes) with open(temp_file, 'r', newline='') as fh: actual = fh.read()

Traceback (most recent call last): File "<input>", line 11, in <module> File "/home/.../lib64/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 2: invalid start byte

2条回答

网友

1楼 · 编辑于 2024-04-20 03:14:50

如果已将文件以字节形式写入，则应以字节形式读入。你知道吗

f = open("myfile.txt", "rb")

如果您将其作为文本读入（使用"r"或"rt"），则将尝试将其解码为Unicode。默认情况下使用的编码取决于平台。但你显然根本不想被破解。你知道吗

网友

2楼 · 编辑于 2024-04-20 03:14:50

在文本模式下打开文件时，使用'rt'（其中“r”和“t”都是默认值），从文件中读取的所有内容都会被动态透明地解码，并作为str对象返回，如Text I/O中所述。你知道吗

打开文件时可以强制使用编码，如：

f = open("myfile.txt", "r", encoding="utf-8")

如open文档中所述：

The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.

（注意sys.getdefaultencoding()是不相关的：它返回Unicode实现使用的当前默认字符串编码的名称）

正如您在注释中所述，在您的系统上，locale.getpreferredencoding()在Windows上给出“cp1252”，在Linux上给出“UTF-8”。你知道吗

CP-1252是一种单字节编码，每个字节对应一个字符。所以，无论你读什么文件，它包含的数据都可以转换成字符串。你知道吗

UTF-8使用可变宽度编码，并非所有字节序列都有效并表示一个字符。这就是为什么当某些字节无法解码时，尝试在Linux系统上读取文件失败的原因。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章