如何在Python 2.5中模拟ZipFile.open？

Question

我想把一个压缩文件里的某个文件提取到指定的路径，并且不想保留压缩包里的文件路径。在Python 2.6中，这个操作非常简单（我的文档说明比代码还长）。

import shutil
import zipfile

def extract_from_zip(name, dest_path, zip_file):
    """Similar to zipfile.ZipFile.extract but extracts the file given by name
    from the zip_file (instance of zipfile.ZipFile) to the given dest_path
    *ignoring* the filename path given in the archive completely
    instead of preserving it as extract does.
    """
    dest_file = open(dest_path, 'wb')
    archived_file = zip_file.open(name)
    shutil.copyfileobj(archived_file, dest_file)


 extract_from_zip('path/to/file.dat', 'output.txt', zipfile.ZipFile('test.zip', 'r'))

但是在Python 2.5中，ZipFile.open这个方法是没有的。我在StackOverflow上找不到解决方案，不过在这个论坛帖子里有一个不错的解决办法，它利用了ZipInfo.file_offset来定位压缩包中的正确位置，然后用zlib.decompressobj来解压缩那里的字节。不过很不幸的是，ZipInfo.file_offset在Python 2.5中被移除了！

所以，考虑到在Python 2.5中我们只有ZipInfo.header_offset，我想我只能自己解析并跳过头部结构，才能找到文件的偏移量。我参考了维基百科的内容（我知道这样做），于是想出了一个更长且不太优雅的解决方案。

import zipfile
import zlib

def extract_from_zip(name, dest_path, zip_file):
    """Python 2.5 version :("""
    dest_file = open(dest_path, 'wb')
    info = zip_file.getinfo(name)
    if info.compress_type == zipfile.ZIP_STORED:
        decoder = None
    elif info.compress_type == zipfile.ZIP_DEFLATED:
        decoder = zlib.decompressobj(-zlib.MAX_WBITS)
    else:
        raise zipfile.BadZipFile("Unrecognized compression method")

    # Seek over the fixed size fields to the "file name length" field in
    # the file header (26 bytes). Unpack this and the "extra field length"
    # field ourselves as info.extra doesn't seem to be the correct length.
    zip_file.fp.seek(info.header_offset + 26)
    file_name_len, extra_len = struct.unpack("<HH", zip_file.fp.read(4))
    zip_file.fp.seek(info.header_offset + 30 + file_name_len + extra_len)

    bytes_to_read = info.compress_size

    while True:
        buff = zip_file.fp.read(min(bytes_to_read, 102400))
        if not buff:
            break
        bytes_to_read -= len(buff)
        if decoder:
            buff = decoder.decompress(buff)
        dest_file.write(buff)

    if decoder:
        dest_file.write(decoder.decompress('Z'))
        dest_file.write(decoder.flush())

注意我如何解压并读取那个给出额外字段长度的字段，因为直接对ZipInfo.extra属性调用len会少算4个字节，这样就会导致偏移量计算错误。也许我在这里漏掉了什么？

有没有人能改进这个在Python 2.5中的解决方案？

编辑：我应该说，ChrisAdams建议的明显解决方案

dest_file.write(zip_file.read(name))

对于任何合理大小的压缩文件来说，都会因为试图一次性把整个文件加载到内存中而导致MemoryError错误。我有大文件，所以我需要把内容流式写入磁盘。

当然，升级Python是个明显的解决方案，但这完全不在我的控制范围内，基本上是不可能的。

版本兼容压缩文件 zlib zipfile 文件解压流式写入头部结构偏移量计算

如何在Python 2.5中模拟ZipFile.open？

3 个回答

撰写回答