Python可以从文件中读取nonascii文本吗？

2条回答

网友

1楼 · 编辑于 2024-05-23 18:40:43

有两种选择。

第一种显然更简单。您不会显示如何打开文件，但假设您的代码如下所示：

with open(path) as file_obj:
    for line in file_obj:

这样做：

^{pr2}$

就这样。

正如the docs解释的那样，如果不在文本模式下指定编码：

The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any encoding supported by Python can be used.

在某些情况下（例如，任何具有适当配置的OS X或linux），locale.getpreferredencoding()将始终是“UTF-8”。但很明显，它永远不会是“自动为我可能打开的任何文件选择合适的东西”。因此，如果知道一个文件是UTF-8，就应该显式地指定它。

网友

2楼 · 编辑于 2024-05-23 18:40:43

对于Python 2和3解决方案，请使用编解码器：

import codecs
file_obj = codecs.open('ur file', "r", "utf-8")

for line in file_obj:
    ...

否则python3使用abarner的solution