如何读取gz文件中的文件名

9 投票

5 回答

6420 浏览

提问于 2025-04-17 20:12

我尝试读取一个gz文件：

with open(os.path.join(storage_path,file), "rb") as gzipfile:
        with gzip.GzipFile(fileobj=gzipfile) as datafile:
            data = datafile.read()

这个方法可以用，但我需要知道这个gz文件里每个文件的名字和大小。现在这段代码只是把压缩包里包含的文件内容打印出来。

我该怎么才能读取这个gz文件里包含的文件名呢？

文件大小 gz文件压缩文件文件名读取

5 个回答

新代码：

fl = search_files(storage_path)     
for f in fl:
    with open(os.path.join(storage_path,f), "rb") as gzipfile:
        #try with module 2^32
        gzipfile.seek(-4,2)
        r = gzipfile.read()
        print str(storage_path) + "/" + str(f[:-3]) +  " : " + str(struct.unpack('<I' ,r)[0]) + " bytes" #dimensione del file pcap

回答于 2025-04-17 由 Python大师

分享举报

GzipFile 本身并没有这些信息，但是：

文件名通常是压缩包的名字，去掉 .gz 后的部分。
如果解压后的文件小于 4G，那么压缩包最后四个字节会包含解压后文件的大小：

In [14]: f = open('fuse-ext2-0.0.7.tar.gz')

In [15]: f.seek(-4, 2)

In [16]: import struct

In [17]: r = f.read()

In [18]: struct.unpack('<I', r)[0]
Out[18]: 7106560

In [19]: len(gzip.open('fuse-ext2-0.0.7.tar.gz').read())
Out[19]: 7106560

（从技术上讲，最后四个字节是原始（未压缩）输入数据的大小，经过 2³² 取模处理，具体在成员尾部的 ISIZE 字段中，详细信息可以查看 http://www.gzip.org/zlib/rfc-gzip.html）

回答于 2025-04-17 由 Python大师

分享举报

Python的gzip模块并不能直接获取某些信息。

它的源代码在处理时会跳过这些信息，根本不存储它：

if flag & FNAME:
    # Read and discard a null-terminated string containing the filename
    while True:
        s = self.fileobj.read(1)
        if not s or s=='\000':
            break

文件名这一部分是可选的，并不一定会存在（如果你使用命令行的gzip -c解压选项，原始文件名会去掉.gz后缀，我想是这样的）。未压缩的文件大小并不会存储在文件头里；你可以在最后四个字节中找到它。

如果你想自己从文件头中读取文件名，就需要重新编写读取文件头的代码，并且要保留文件名的字节。下面的函数可以返回这个信息，还有解压后的文件大小：

import struct
from gzip import FEXTRA, FNAME

def read_gzip_info(gzipfile):
    gf = gzipfile.fileobj
    pos = gf.tell()

    # Read archive size
    gf.seek(-4, 2)
    size = struct.unpack('<I', gf.read())[0]

    gf.seek(0)
    magic = gf.read(2)
    if magic != '\037\213':
        raise IOError('Not a gzipped file')

    method, flag, mtime = struct.unpack("<BBIxx", gf.read(8))

    if not flag & FNAME:
        # Not stored in the header, use the filename sans .gz
        gf.seek(pos)
        fname = gzipfile.name
        if fname.endswith('.gz'):
            fname = fname[:-3]
        return fname, size

    if flag & FEXTRA:
        # Read & discard the extra field, if present
        gf.read(struct.unpack("<H", gf.read(2)))

    # Read a null-terminated string containing the filename
    fname = []
    while True:
        s = gf.read(1)
        if not s or s=='\000':
            break
        fname.append(s)

    gf.seek(pos)
    return ''.join(fname), size

使用上面的函数时，需要一个已经创建好的gzip.GzipFile对象：

filename, size = read_gzip_info(gzipfileobj)

回答于 2025-04-17 由 Python大师

分享举报

如何读取gz文件中的文件名

5 个回答

撰写回答