处理Python编解码错误？

3 投票

1 回答

4167 浏览

提问于 2025-04-16 09:09

File "/usr/lib/python3.1/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 805: invalid start byte

你好，我遇到了一个异常。我该如何捕捉这个异常，并在出现这个异常时继续读取我的文件呢？

我的程序有一个循环，它逐行读取一个文本文件，并尝试进行一些处理。但是，有些文件可能不是文本文件，或者有些行的格式不正确（比如外语等）。我想忽略这些行。

以下代码没有起作用

for line in sys.stdin:
   if line != "":
      try:
         matched = re.match(searchstuff, line, re.IGNORECASE)
         print (matched)
      except UnicodeDecodeError, UnicodeEncodeError:
         continue

异常处理文本处理文件读取循环结构编解码错误

1 个回答

看看这个链接：http://docs.python.org/py3k/library/codecs.html。当你打开编码流的时候，可能会想要加一个额外的参数errors='ignore'。

在Python 3中，sys.stdin 默认是以文本流的方式打开的（可以参考这个链接：http://docs.python.org/py3k/library/sys.html），而且它的错误检查非常严格。

你需要重新打开它，变成一个可以容忍错误的utf-8流。像下面这样就可以了：

sys.stdin = codecs.getreader('utf8')(sys.stdin.detach(), errors='ignore')

回答于 2025-04-16 由 Python大师

分享举报

处理Python编解码错误？

1 个回答

撰写回答