如何在字节数组中解压缩gzipped数据？

Question

我有一个字节数组，这里面的数据是用gzip压缩过的。现在我需要把这些数据解压缩。请问该怎么做呢？

Answer 1

看起来你可以这样做点击这里

import zlib
# ...
ungziped_str = zlib.decompressobj().decompress('x\x9c' + gziped_str)

或者这样：

zlib.decompress( data ) # equivalent to gzdecompress()

想了解更多信息，可以查看这里： Python文档

Answer 2

zlib.decompress(data, 15 + 32) 这个函数会自动判断你传入的数据是 gzip 格式的还是 zlib 格式的。

而 zlib.decompress(data, 15 + 16) 这个函数在处理 gzip 数据时会正常工作，但如果是 zlib 数据就会出错。

下面是一个使用 Python 2.7.1 的例子，创建一个小的 gz 文件，然后读取它并解压：

>>> import gzip, zlib
>>> f = gzip.open('foo.gz', 'wb')
>>> f.write(b"hello world")
11
>>> f.close()
>>> c = open('foo.gz', 'rb').read()
>>> c
'\x1f\x8b\x08\x08\x14\xf4\xdcM\x02\xfffoo\x00\xcbH\xcd\xc9\xc9W(\xcf/\xcaI\x01\x00\x85\x11J\r\x0b\x00\x00\x00'
>>> ba = bytearray(c)
>>> ba
bytearray(b'\x1f\x8b\x08\x08\x14\xf4\xdcM\x02\xfffoo\x00\xcbH\xcd\xc9\xc9W(\xcf/\xcaI\x01\x00\x85\x11J\r\x0b\x00\x00\x00')
>>> zlib.decompress(ba, 15+32)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: must be string or read-only buffer, not bytearray
>>> zlib.decompress(bytes(ba), 15+32)
'hello world'
>>>

在 Python 3.x 中的用法也非常相似。

更新：根据评论你在使用 Python 2.2.1。

唉，这甚至不是 Python 2.2 的最后一个版本。无论如何，继续使用上面创建的 foo.gz 文件：

Python 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> strobj = open('foo.gz', 'rb').read()
>>> strobj
'\x1f\x8b\x08\x08\x14\xf4\xdcM\x02\xfffoo\x00\xcbH\xcd\xc9\xc9W(\xcf/\xcaI\x01\x00\x85\x11J\r\x0b\x00\x00\x00'
>>> import zlib
>>> zlib.decompress(strobj, 15+32)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
zlib.error: Error -2 while preparing to decompress data
>>> zlib.decompress(strobj, 15+16)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
zlib.error: Error -2 while preparing to decompress data

# OK, we can't use the back door method. Plan B: use the 
# documented approach i.e. gzip.GzipFile with a file-like object.

>>> import gzip, cStringIO
>>> fileobj = cStringIO.StringIO(strobj)
>>> gzf = gzip.GzipFile('dummy-name', 'rb', 9, fileobj)
>>> gzf.read()
'hello world'

# Success. Now let's assume you have an array.array object-- which requires
# premeditation; they aren't created accidentally!
# The following code assumes subtype 'B' but should work for any subtype.

>>> import array, sys
>>> aaB = array.array('B')
>>> aaB.fromfile(open('foo.gz', 'rb'), sys.maxint)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
EOFError: not enough items in file
#### Don't panic, just read the fine manual
>>> aaB
array('B', [31, 139, 8, 8, 20, 244, 220, 77, 2, 255, 102, 111, 111, 0, 203, 72, 205, 201, 201, 87, 40, 207, 47, 202, 73, 1, 0, 133, 17, 74, 13, 11, 0, 0, 0])
>>> strobj2 = aaB.tostring()
>>> strobj2 == strobj
1 #### means True 
# You can make a str object and use that as above.

# ... or you can plug it directly into StringIO:
>>> gzip.GzipFile('dummy-name', 'rb', 9, cStringIO.StringIO(aaB)).read()
'hello world'

如何在字节数组中解压缩gzipped数据？

2 个回答

撰写回答