如何在内存中下载并解压gzipped文件?

41 投票
4 回答
55740 浏览
提问于 2025-04-17 18:46

我想用urllib下载一个文件,并在保存之前先在内存中解压这个文件。

这是我现在的代码:

response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())
decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')
outfile = open(outFilePath, 'w')
outfile.write(decompressedFile.read())

这样做最后写出来的文件都是空的。我该怎么才能实现我想要的效果呢?

更新的回答:

#! /usr/bin/env python2
import urllib2
import StringIO
import gzip

baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"        
# check filename: it may change over time, due to new updates
filename = "man-pages-5.00.tar.gz" 
outFilePath = filename[:-3]

response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO(response.read())
decompressedFile = gzip.GzipFile(fileobj=compressedFile)

with open(outFilePath, 'w') as outfile:
    outfile.write(decompressedFile.read())

4 个回答

20

如果你使用的是Python 3.2或更高版本,生活会简单很多:

#!/usr/bin/env python3
import gzip
import urllib.request

baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"
filename = "man-pages-4.03.tar.gz"
outFilePath = filename[:-3]

response = urllib.request.urlopen(baseURL + filename)
with open(outFilePath, 'wb') as outfile:
    outfile.write(gzip.decompress(response.read()))

对于那些对历史感兴趣的人,可以查看 这个链接这个链接

25

对于使用Python 3的人来说,下面是相应的答案:

import urllib.request
import io
import gzip

response = urllib.request.urlopen(FILE_URL)
compressed_file = io.BytesIO(response.read())
decompressed_file = gzip.GzipFile(fileobj=compressed_file)

with open(OUTFILE_PATH, 'wb') as outfile:
    outfile.write(decompressed_file.read())
52

在你写完数据到compressedFile之后,但在把它传给gzip.GzipFile()之前,你需要把文件指针移到文件的开头。否则,gzip模块会从文件的末尾开始读取,这样它就会把文件当成空文件来处理。下面是示例代码:

#! /usr/bin/env python
import urllib2
import StringIO
import gzip

baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"
filename = "man-pages-3.34.tar.gz"
outFilePath = "man-pages-3.34.tar"

response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())
#
# Set the file's current position to the beginning
# of the file so that gzip.GzipFile can read
# its contents from the top.
#
compressedFile.seek(0)

decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')

with open(outFilePath, 'w') as outfile:
    outfile.write(decompressedFile.read())

撰写回答