如何猴子补丁np.savez_compressed以添加压缩级别,而不编辑numpy源文件?

0 投票
1 回答
33 浏览
提问于 2025-04-13 18:29

我需要修改在 np.savez_compressed 中内部使用的 ZIP compressionlevel(压缩级别)。在 Numpy 的 GitHub 上有一个 功能提案,但还没有实现。

我看到有两个选择:

  • 修改源文件 /numpy/lib/npyio.py,把 zipf = zipfile_factory(file, mode="w", compression=compression) 替换成 <idem>..., compresslevel=compresslevel)。不过这样做的麻烦在于,每次重新安装或升级后,比如运行 pip install numpy,我都得重新修改一次,这样不是个好办法。

  • _savez 函数 进行 猴子补丁

怎么做呢?

我尝试了第二个选项,但出现了 ValueError: seek of closed file 的错误,我不明白为什么会这样:

import numpy as np

def _savez(file, args, kwds, compress, allow_pickle=True, pickle_kwargs=None):
    import zipfile
    if not hasattr(file, 'write'):
        file = os_fspath(file)
        if not file.endswith('.npz'):
            file = file + '.npz'
    namedict = kwds
    for i, val in enumerate(args):
        key = 'arr_%d' % i
        if key in namedict.keys():
            raise ValueError("Cannot use un-named variables and keyword %s" % key)
        namedict[key] = val
    if compress:
        compression = zipfile.ZIP_DEFLATED
    else:
        compression = zipfile.ZIP_STORED
    zipf = np.lib.npyio.zipfile_factory(file, mode="w", compression=compression, compresslevel=2)  # !! the only modified line !!
    for key, val in namedict.items():
        fname = key + '.npy'
        val = np.asanyarray(val)
        # always force zip64, gh-10776
        with zipf.open(fname, 'w', force_zip64=True) as fid:
            format.write_array(fid, val, allow_pickle=allow_pickle, pickle_kwargs=pickle_kwargs)
    zipf.close()

np.lib.npyio._savez = _savez    

x = np.array([1, 2, 3, 4])
with open("test.npz", "wb") as f:
    np.savez_compressed(f, x=x)

1 个回答

0

我找到了一种更简单的解决办法:

import numpy as np
def zipfile_factory(file, *args, **kwargs):
    if not hasattr(file, 'read'):
        file = os_fspath(file)
    import zipfile
    kwargs['allowZip64'] = True
    kwargs['compresslevel'] = 4
    return zipfile.ZipFile(file, *args, **kwargs)
np.lib.npyio.zipfile_factory = zipfile_factory
with open("test.npz", "wb") as f:
    np.savez_compressed(f, x=np.ones(10_000_000))

补充: 之前的解决办法:

我在这段时间里找到了答案:format 应该用 np.lib.npyio.format 来替换。现在这样就可以用了:

import numpy as np

def _savez(file, args, kwds, compress, allow_pickle=True, pickle_kwargs=None):
    import zipfile
    if not hasattr(file, 'write'):
        file = os_fspath(file)
        if not file.endswith('.npz'):
            file = file + '.npz'
    namedict = kwds
    for i, val in enumerate(args):
        key = 'arr_%d' % i
        if key in namedict.keys():
            raise ValueError("Cannot use un-named variables and keyword %s" % key)
        namedict[key] = val
    if compress:
        compression = zipfile.ZIP_DEFLATED
    else:
        compression = zipfile.ZIP_STORED
    zipf = np.lib.npyio.zipfile_factory(file, mode="w", compression=compression, compresslevel=1)
    for key, val in namedict.items():
        fname = key + '.npy'
        val = np.asanyarray(val)
        # always force zip64, gh-10776
        with zipf.open(fname, 'w', force_zip64=True) as fid:
            np.lib.npyio.format.write_array(fid, val, allow_pickle=allow_pickle, pickle_kwargs=pickle_kwargs)
    zipf.close()

np.lib.npyio._savez = _savez    

with open("test.npz", "wb") as f:
    np.savez_compressed(f, x=np.array([1, 2, 3]))

撰写回答