Python:Unicode和“\xe2\x80\x99”让我兴奋不已

2024-05-16 09:19:44 发布

男 | 程序猿一只，喜欢编程写python代码。

所以我有一个来自Google Docs的.txt文件，其中包含David Foster Wallace的“遗忘”中的一些行。使用：

with open("oblivion.txt", "r", 0) as bookFile:
    wordList = []
    for line in bookFile:
        wordList.append(line)

返回并打印我得到的单词列表：

"surgery on the crow\xe2\x80\x99s feet around her eyes."

（而且它截断了很多文本）。但是，如果不是附加单词表，我只是

for line in bookFile:
    print line

一切都很好！对文件的.read（）也是如此-生成的str没有疯狂的字节表示，但是我不能按照我想要的方式操作它。

我该在哪里.encode（）或.decode（）或什么？ ~~使用Python2是因为3给了我一些I/O缓冲区错误。~~谢谢。

Tags：文件 in txt docs for with google line

2条回答

网友

1楼 · 编辑于 2024-05-16 09:19:44

尝试open使用encoding作为utf-8：

with open("oblivion.txt", "r", encoding='utf-8') as bookFile:
    wordList = bookFile.readlines()

网友

2楼 · 编辑于 2024-05-16 09:19:44

如果你坚持使用Python 2并且想使用Rahul的答案

import io
with io.open("oblivion.txt", "r", encoding='utf-8') as bookFile:
    wordList = bookFile.readlines()