读取python中以utf8字符表示的文件字节

2024-05-15 12:50:27 发布

您现在位置：Python中文网/ 问答频道 /正文

1053

网友

男 | 程序猿一只，喜欢编程写python代码。

我有一个由Windows操作系统中的内置工具生成的.txt文件，我需要在python脚本中解析它（如果相关的话，在Linux机器上）

我打开的文件如下所示：

with open(path, 'r') as spec_file:

我甚至尝试了io库

io.open(detail, mode="r", encoding="utf-8") as spec_file:

在（例如）升华文本中打开文件时，在逐行遍历文件时，文件将正确显示：

for line in spec_file:

和打印（print(line)）我也得到了正确的表示：

**********************************************************************************
* This diagnostic information may be used by an IT administrator to troubleshoot *
* the installed Trusted Platform Module (TPM). Please zip the folder and attach  *
* it to issues filed through Feedback Hub or with an IT admin.                   *
**********************************************************************************

但是，当打印为print(repr(line))时，我只得到字符字节表示：

'*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00\n'
'\x00\n'
'\x00*\x00 \x00T\x00h\x00i\x00s\x00 \x00d\x00i\x00a\x00g\x00n\x00o\x00s\x00t\x00i\x00c\x00 \x00i\x00n\x00f\x00o\x00r\x00m\x00a\x00t\x00i\x00o\x00n\x00 \x00m\x00a\x00y\x00 \x00b\x00e\x00 \x00u\x00s\x00e\x00d\x00 \x00b\x00y\x00 \x00a\x00n\x00 \x00I\x00T\x00 \x00a\x00d\x00m\x00i\x00n\x00i\x00s\x00t\x00r\x00a\x00t\x00o\x00r\x00 \x00t\x00o\x00 \x00t\x00r\x00o\x00u\x00b\x00l\x00e\x00s\x00h\x00o\x00o\x00t\x00 \x00*\x00\n'
'\x00\n'
'\x00*\x00 \x00t\x00h\x00e\x00 \x00i\x00n\x00s\x00t\x00a\x00l\x00l\x00e\x00d\x00 \x00T\x00r\x00u\x00s\x00t\x00e\x00d\x00 \x00P\x00l\x00a\x00t\x00f\x00o\x00r\x00m\x00 \x00M\x00o\x00d\x00u\x00l\x00e\x00 \x00(\x00T\x00P\x00M\x00)\x00.\x00 \x00P\x00l\x00e\x00a\x00s\x00e\x00 \x00z\x00i\x00p\x00 \x00t\x00h\x00e\x00 \x00f\x00o\x00l\x00d\x00e\x00r\x00 \x00a\x00n\x00d\x00 \x00a\x00t\x00t\x00a\x00c\x00h\x00 \x00 \x00*\x00\n'
'\x00\n'
'\x00*\x00 \x00i\x00t\x00 \x00t\x00o\x00 \x00i\x00s\x00s\x00u\x00e\x00s\x00 \x00f\x00i\x00l\x00e\x00d\x00 \x00t\x00h\x00r\x00o\x00u\x00g\x00h\x00 \x00F\x00e\x00e\x00d\x00b\x00a\x00c\x00k\x00 \x00H\x00u\x00b\x00 \x00o\x00r\x00 \x00w\x00i\x00t\x00h\x00 \x00a\x00n\x00 \x00I\x00T\x00 \x00a\x00d\x00m\x00i\x00n\x00.\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00*\x00\n'
'\x00\n'
'\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00*\x00\n'

这样就不可能搜索整个文件并将其作为字符串使用，所以我需要以某种方式将其转换为utf-8字符串，有什么想法吗

Tags：文件 x00 x00t x00a x00i x00h x00s x00l

1条回答

网友

1楼 · 发布于 2024-05-15 12:50:27

您的文件是用UTF-16 LE编码的（因为Windows，请参阅this question了解更多信息），因此您需要将其设置为编码：

with open(path, 'r', encoding="utf-16le") as spec_file:

LE代表小Endian，这很重要，因为常规的“utf-16”检查字节顺序标记，Windows不会输出该标记（同样，因为Windows），所以需要显式地声明Endian

读取python中以utf8字符表示的文件字节

相关问题更多 >

编程相关推荐

热门问题

热门文章

读取python中以utf8字符表示的文件字节

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >