为什么在NDArray视图上设置标志会导致分配?它们一定有限吗?

0 投票
1 回答
32 浏览
提问于 2025-04-12 17:31

考虑一下这段代码:

import numpy as np
import itertools


def get_view(arr):
    view = arr.view()
    view.flags.writeable = False  # this line causes memory to leak?
    return view


def main():
    for _ in itertools.count():
        get_view(np.zeros(1000))


if __name__ == "__main__":
    main()

看起来设置视图为不可写的那一行导致了内存泄漏,虽然我不确定这是否是有界的。

  1. 为什么会发生这种情况?
  2. 这一定是有界的吗?还是说这是numpy的一个bug?或者它们可能是引用计数的,但出于某种原因,手动调用垃圾回收器却没有回收它们?

这是同一个程序,添加了tracemalloc逻辑,每100,000次调用get_view时打印一次内存分配情况。

import numpy as np
import tracemalloc
import itertools
import gc


def log_diff(snapshot, prev_snapshot):
    diff = snapshot.compare_to(prev_snapshot, "lineno")
    reported = 0
    for stat in diff:
        if "tracemalloc.py" in stat.traceback[0].filename:
            continue
        if stat.size_diff <= 0:
            continue
        print(f"#{reported}: {stat}")
        reported += 1
    print("---")


def get_view(arr):
    view = arr.view()
    view.flags.writeable = False  # this line causes memory to leak?
    return view


def main():
    tracemalloc.start()
    prev_snapshot = None
    for i in itertools.count():
        get_view(np.zeros(1000))
        if i % 100000 == 0:
            gc.collect(generation=2)
            snapshot = tracemalloc.take_snapshot()
            if prev_snapshot is not None:
                log_diff(snapshot, prev_snapshot)
            prev_snapshot = snapshot


if __name__ == "__main__":
    main()

在Linux上使用Python 3.11.6和numpy 1.26.4时,我们得到的内存分配数量似乎是不确定的,但我见过的最大值大约是250。它一开始增长得很快,后来增长得就慢多了。

如果我把设置view.flags.writeable的那一行注释掉,内存使用量就不会增长。

#0: /home/sami/bug.py:22: size=3534 B (+3477 B), count=62 (+61), average=57 B
#1: /home/sami/bug.py:29: size=84 B (+28 B), count=2 (+1), average=42 B
---
#0: /home/sami/bug.py:22: size=5871 B (+2337 B), count=103 (+41), average=57 B
#1: /home/sami/bug.py:15: size=72 B (+72 B), count=1 (+1), average=72 B
---
---
#0: /home/sami/bug.py:22: size=6270 B (+399 B), count=110 (+7), average=57 B
---
#0: /home/sami/bug.py:22: size=6327 B (+57 B), count=111 (+1), average=57 B
---
#0: /home/sami/bug.py:22: size=7638 B (+1311 B), count=134 (+23), average=57 B
---
#0: /home/sami/bug.py:22: size=7809 B (+171 B), count=137 (+3), average=57 B
---
---
#0: /home/sami/bug.py:22: size=8436 B (+627 B), count=148 (+11), average=57 B
---
#0: /home/sami/bug.py:22: size=8664 B (+228 B), count=152 (+4), average=57 B
---
#0: /home/sami/bug.py:22: size=8892 B (+228 B), count=156 (+4), average=57 B
---
---
#0: /home/sami/bug.py:22: size=9120 B (+228 B), count=160 (+4), average=57 B
---
---
#0: /home/sami/bug.py:22: size=9177 B (+114 B), count=161 (+2), average=57 B
---
...

1 个回答

1

我不太确定这是不是内存泄漏,但我可以给你一个不占用内存的例子:

view.setflags(write=False)

在tracemalloc这个工具下运行时,可以看到这一行并没有占用内存。

撰写回答