在Python中,是否可以重载Numpy的memmap使其在不再被引用时自动删除?

4 投票
2 回答
1566 浏览
提问于 2025-04-18 09:31

我正在尝试使用memmap,当某些数据太大无法放进内存时,memmap可以让代码觉得它只是一个普通的ndarray(数组)。为了更好地利用memmap,我想知道是否可以重载memmap的解引用操作符,以便在删除memmap文件时使用。

举个例子:

from tempfile import mkdtemp
import os.path as path
filename = path.join(mkdtemp(), 'tmpfile.dat')
{
    out = np.memmap(filename, dtype=a.dtype, mode='w+', shape=a.shape)
}
# At this point out is out of scope, so the overloaded 
# dereference function would delete tmpfile.dat

这样做听起来可行吗?有没有我没考虑到的地方?

谢谢!

2 个回答

2

如果我们不想使用“with”这个关键词,而是希望有一个类来为我们处理这些事情,可以这样做:

class tempmap(np.memmap):
    """
    Extension of numpy memmap to automatically map to a file stored in temporary directory.
    Usefull as a fast storage option when numpy arrays become large and we just want to do some quick experimental stuff.
    """
    def __new__(subtype, dtype=np.uint8, mode='w+', offset=0,
                shape=None, order='C'):
        ntf = tempfile.NamedTemporaryFile()
        self = np.memmap.__new__(subtype, ntf, dtype, mode, offset, shape, order)
        self.temp_file_obj = ntf
        return self

    def __del__(self):
        if hasattr(self,'temp_file_obj') and self.temp_file_obj is not None:
            self.temp_file_obj.close()
            del self.temp_file_obj

def np_as_tmp_map(nparray):
    tmpmap = tempmap(dtype=nparray.dtype, mode='w+', shape=nparray.shape)
    tmpmap[...] = nparray
    return tmpmap


def test_memmap():
    """Test, that deleting a temp memmap also deletes the file."""
    x = np_as_tmp_map(np.zeros(10, 10), np.float))
    name = copy(x.temp_file_obj.name)
    del x
    x = None
    assert not os.path.isfile(name)
4

只要在用 np.memmap 打开文件后把它删除,系统会在最后一个引用这个文件的地方关闭时自动删除这个文件。

Python 的临时文件就是这样工作的,可以很方便地和 with 这个上下文管理器一起使用:

with tempfile.NamedTemporaryFile() as f:
    # file gone now from the filesystem 
    # but f still holds a reference so it still exists and uses space (see /prof<pid>/fd)
    # open it again (will not work on windows)
    x = np.memmap(f.name, dtype=np.float64, mode='w+', shape=(3,4))
# file path is unlinked but exists on disk with the open file reference in x
del x
# now all references are gone and the file is properly deleted

撰写回答