如何使用Python读取utf-8编码的文本文件

2024-04-24 05:01:24 发布

男 | 程序猿一只，喜欢编程写python代码。

我需要分析泰米尔语的文本文件（utf-8编码）。我在接口IDLE上使用了Python的nltk包。当我试图读取界面上的文本文件时，这是我得到的错误。我该如何避免？

corpus = open('C:\\Users\\Customer\\Desktop\\DISSERTATION\\ettuthokai.txt').read()

Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    corpus = open('C:\\Users\\Customer\\Desktop\\DISSERTATION\\ettuthokai.txt').read()
  File "C:\Users\Customer\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 33: character maps to <undefined>

Tags： in txt read line corpus customer open users

1条回答

网友

1楼 · 发布于 2024-04-24 05:01:24

由于您使用的是Python 3，只需将encoding参数添加到open()：

corpus = open(
    r"C:\Users\Customer\Desktop\DISSERTATION\ettuthokai.txt", encoding="utf-8"
).read()

如何使用Python读取utf-8编码的文本文件

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何使用Python读取utf-8编码的文本文件

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >