“utf-8”编解码器无法解码字节0x80

2024-04-29 19:39:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试下载经过BVLC训练的模型,但我遇到了这个错误

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 110: invalid start byte

我想是因为下面的函数(complete code

  # Closure-d function for checking SHA1.
  def model_checks_out(filename=model_filename, sha1=frontmatter['sha1']):
      with open(filename, 'r') as f:
          return hashlib.sha1(f.read()).hexdigest() == sha1

知道怎么解决吗?


Tags: in模型model错误positionbytefilenamecan
3条回答

您没有指定以二进制模式打开该文件,因此f.read()正在尝试将该文件读取为UTF-8编码的文本文件,这似乎不起作用。但由于我们采用的是字节的散列,而不是字符串的散列,所以编码是什么,甚至文件是不是文本都无关紧要:只需打开它,然后将其作为二进制文件读取。

>>> with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
Traceback (most recent call last):
  File "<ipython-input-3-fdba09d5390b>", line 1, in <module>
    with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
  File "/home/dsm/sys/pys/Python-3.5.1-bin/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 10: invalid start byte

但是

>>> with open("test.h5.bz2","rb") as f: print(hashlib.sha1(f.read()).hexdigest())
21bd89480061c80f347e34594e71c6943ca11325

打开的文件不是UTF-8编码的,而系统的默认编码设置为UTF-8。

因为您正在计算SHA1散列,所以应该改为将数据读取为binaryhashlib函数要求传入字节:

with open(filename, 'rb') as f:
    return hashlib.sha1(f.read()).hexdigest() == sha1

注意在文件模式中添加了b

请参见^{} documentation

mode is an optional string that specifies the mode in which the file is opened. It defaults to 'r' which means open for reading in text mode. [...] In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding. (For reading and writing raw bytes use binary mode and leave encoding unspecified.)

^{} module documentation

You can now feed this object with bytes-like objects (normally bytes) using the update() method.

由于文档和src代码中没有任何提示,我不知道为什么,但是使用b字符(我猜是二进制的)完全可以(tf版本:1.1.0):

image_data = tf.gfile.FastGFile(filename, 'rb').read()

For more information, check out: gfile

相关问题 更多 >