Python3 UnicodeDecodeError with readlines（）方法

网友

1楼 · 编辑于 2024-04-25 12:09:55

您的默认编码似乎是ASCII，其中的输入很可能是UTF-8。当您在输入中命中非ASCII字节时，它将引发异常。与其说是readlines本身造成了这个问题，不如说是它导致了read+decode的发生，并且解码失败。

不过，这是一个简单的解决方案；Python 3中的默认open允许您提供已知的输入encoding，用任何其他可识别的编码替换默认值（在您的例子中是ASCII）。如果它允许您继续以str（而不是显著不同的原始二进制数据bytes对象）的形式读取，同时让Python完成从原始磁盘字节到真正文本数据的转换工作：

# Using with statement closes the file for us without needing to remember to close
# explicitly, and closes even when exceptions occur
with open(argfile, encoding='utf-8') as inf:
    f = inf.readlines()

网友

2楼 · 编辑于 2024-04-25 12:09:55

我认为（在Python 3中）最好的答案是使用errors=参数：

with open('evil_unicode.txt', 'r', errors='replace') as f:
    lines = f.readlines()

证明：

>>> s = b'\xe5abc\nline2\nline3'
>>> with open('evil_unicode.txt','wb') as f:
...     f.write(s)
...
16
>>> with open('evil_unicode.txt', 'r') as f:
...     lines = f.readlines()
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/codecs.py", line 319, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 0: invalid continuation byte
>>> with open('evil_unicode.txt', 'r', errors='replace') as f:
...     lines = f.readlines()
...
>>> lines
['�abc\n', 'line2\n', 'line3']
>>>

注意errors=可以是replace或ignore。以下是ignore的样子：

>>> with open('evil_unicode.txt', 'r', errors='ignore') as f:
...     lines = f.readlines()
...
>>> lines
['abc\n', 'line2\n', 'line3']

网友

3楼 · 编辑于 2024-04-25 12:09:55

最终找到了一个可行的答案：

filename=open(argfile, 'rb')

This post帮了我很多忙。

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python3 UnicodeDecodeError with readlines（）方法

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >