由于UnicodeDecodeE的原因,下载的.gz文件解压缩失败

2024-03-29 11:01:52 发布

您现在位置:Python中文网/ 问答频道 /正文

尝试下载一个.gz文件,将其解压到内存中,然后逐行读取解压后的内容。你知道吗

import requests

r = requests.get(url)
print(r.headers)

with gzip.open(r.content, 'rb') as f:
    '''Reading line by line'''

现在标题如下所示:

{'Date': 'Fri, 23 Aug 2019 07:19:28 GMT', 'Server': 'Apache', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'sameorigin', 'Referrer-Policy': 'no-referrer', 'X-Xss-Protection': '1', 'Last-Modified': 'Sat, 23 Jun 2018 09:21:46 GMT', 'ETag': '"8be6ca-56f4bad760d07"', 'Accept-Ranges': 'bytes', 'Content-Length': '9168586', 'X-Clacks-Overhead': 'GNU Terry Pratchett', 'Cache-Control': 'public, max-age=120', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive', 'Content-Type': 'application/x-gzip'}

我得到的错误似乎是一个编码错误,但是我想请求已经在UTF-8中有了r.content,并且gzip.open()想要UTF-8,所以我没有得到这个错误:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

所以也有类似的问题,对我没有帮助。你知道吗


Tags: type错误lineopencontentbyterequestsmax
1条回答
网友
1楼 · 发布于 2024-03-29 11:01:52

根据[Python-Requests.2]: Developer Interface - class requests.Response.content强调是我的):

Content of the response, in bytes.

另一方面,[Python 3.Docs]: gzip.open(filename, mode='rb', compresslevel=9, encoding=None, errors=None, newline=None)

The filename argument can be an actual filename (a str or bytes object), or an existing file object to read from or write to.

要解决此问题,您不应该将r.content传递给gzip.open,但是:

  • 用类似io.BytesIO的对象包装它,并将其传递给gzip.open

    with gzip.open(io.BytesIO(fc)) as f:
        # Your original code (that reads line by line)
    
  • 把它传给gzip.decompress

    extracted = gzip.decompress(r.content)
    for line in extracted.split(b"\n"):
        # Process each line
        print(line.decode())
    

    或(与上一个项目符号组合)

    with io.BytesIO(gzip.decompress(r.content)) as f:
        # Your original code (that reads line by line)
    
  • 将其保存到一个文件中,并将其名称传递给gzip.open(请注意,这非常慢,并且可能会引入其他潜在问题,如@Aran Fey所指出的):

    file_name = "content.gzip"
    with open(file_name, "wb") as f:
        f.write(r.content)
    with gzip.open(r.content, 'rb') as f:
        # Your original code (that reads line by line)
    os.unlink(file_name)
    

相关问题 更多 >