使用Pylzma进行流式处理和7Zip兼容性

0 投票

1 回答

1125 浏览

提问于 2025-04-18 06:43

我最近在使用pylzma这个工具，但我需要创建与7zip这个Windows应用程序兼容的文件。问题是，我的一些文件非常大（大约3到4GB，是由第三方软件生成的专有二进制格式）。

我在这里和相关的说明文档上反复查看过：https://github.com/fancycode/pylzma/blob/master/doc/USAGE.md

我用以下代码能够创建兼容的文件：

def Compacts(folder,f):
  os.chdir(folder)
  fsize=os.stat(f).st_size
  t=time.clock()
  i = open(f, 'rb')
  o = open(f+'.7z', 'wb')
  i.seek(0)

  s = pylzma.compressfile(i)
  result = s.read(5)
  result += struct.pack('<Q', fsize)
  s=result+s.read()
  o.write(s)
  o.flush()
  o.close()
  i.close()
  os.remove(f)

对于较小的文件（最多2GB），这个代码压缩效果很好，并且与7Zip兼容，但对于较大的文件，经过一段时间后Python就会崩溃。

根据用户指南，要压缩大文件应该使用流式处理，但这样生成的文件就与7zip不兼容，下面的代码片段就是一个例子。

def Compacts(folder,f):
  os.chdir(folder)
  fsize=os.stat(f).st_size
  t=time.clock()
  i = open(f, 'rb')
  o = open(f+'.7z', 'wb')
  i.seek(0)

  s = pylzma.compressfile(i)
  while True:
    tmp = s.read(1)
    if not tmp: break
    o.write(tmp)
  o.flush()
  o.close()
  i.close()
  os.remove(f)

有没有什么办法可以在保持与7zip兼容的同时，使用pylzma中的流式处理技术呢？

流式处理大文件处理压缩算法 7zip 文件兼容性二进制格式用户指南 Pylzma

1 个回答

你仍然需要正确地写出头部信息（.read(5)）和大小，比如这样：

import os
import struct

import pylzma

def sevenzip(infile, outfile):
    size = os.stat(infile).st_size
    with open(infile, "rb") as ip, open(outfile, "wb") as op:
        s = pylzma.compressfile(ip)
        op.write(s.read(5))
        op.write(struct.pack('<Q', size))
        while True:
            # Read 128K chunks.
            # Not sure if this has to be 1 instead to trigger streaming in pylzma...
            tmp = s.read(1<<17)
            if not tmp:
                break
            op.write(tmp)

if __name__ == "__main__":
    import sys
    try:
        _, infile, outfile = sys.argv
    except:
        infile, outfile = __file__, __file__ + u".7z"

    sevenzip(infile, outfile)
    print("compressed {} to {}".format(infile, outfile))

回答于 2025-04-18 由 Python大师

分享举报

使用Pylzma进行流式处理和7Zip兼容性

1 个回答

撰写回答