<p>我会调查垃圾收集。较大的文件可能会更频繁地触发垃圾回收,但小文件会被释放,但总体上会保持在某个阈值上。具体来说,打电话gc.收集()然后打电话gc.get_引用程序()来显示实例的存在。请参阅此处的Python文档:</p>
<p><a href="http://docs.python.org/2/library/gc.html?highlight=gc#gc.get_referrers" rel="nofollow">http://docs.python.org/2/library/gc.html?highlight=gc#gc.get_referrers</a></p>
<h2>更新:</h2>
<p>该问题与垃圾收集、命名空间和引用计数有关。您发布的bash脚本对垃圾收集器的行为提供了一个相当狭窄的视图。尝试一个更大的范围,你会看到模式在多少内存特定的范围将采取。例如,将bash For循环更改为更大的范围,例如:<code>seq 0 16 2056</code>。在</p>
<p>您注意到,如果<code>del mystr</code>,内存使用量会减少,因为您正在删除对它的任何引用。如果将mystr变量限制为它自己的函数,可能会出现类似的结果:</p>
<pre><code>def loopy():
mylist = []
for x in xrange(100):
mystr = ' ' * int(size_kb) * 1024
mydict = {x: mystr}
mylist.append(mydict)
return mylist
</code></pre>
<p>与使用bash脚本相比,我认为使用内存分析器可以获得更多有用的信息。下面是几个使用<a href="http://pythonhosted.org/Pympler/index.html" rel="nofollow">Pympler</a>的示例。第一个版本与更新3中的代码类似:</p>
^{pr2}$
<p>以及输出:</p>
<pre><code>$ python mem_test.py 256
begin:
types | # objects | total size
======================= | =========== | =============
list | 957 | 97.44 KB
str | 951 | 53.65 KB
int | 118 | 2.77 KB
wrapper_descriptor | 8 | 640 B
weakref | 3 | 264 B
member_descriptor | 2 | 144 B
getset_descriptor | 2 | 144 B
function (store_info) | 1 | 120 B
cell | 2 | 112 B
instancemethod | -1 | -80 B
_sre.SRE_Pattern | -2 | -176 B
tuple | -1 | -216 B
dict | 2 | -1744 B
empty list & dict:
types | # objects | total size
======= | =========== | ============
list | 2 | 168 B
str | 2 | 97 B
int | 1 | 24 B
after for loop:
types | # objects | total size
======= | =========== | ============
str | 1 | 256.04 KB
list | 0 | 848 B
after deleting stuff:
types | # objects | total size
======= | =========== | ===============
list | -1 | -920 B
str | -1 | -262181 B
after garbage collection (collected: 0):
types | # objects | total size
======= | =========== | ============
took a short nap after all that work:
types | # objects | total size
======= | =========== | ============
create an empty list for some reason:
types | # objects | total size
======= | =========== | ============
list | 1 | 72 B
</code></pre>
<p>注意,在for循环之后,str类的总大小为256kb,基本上与我传递给它的参数相同。在<code>del mystr</code>中显式删除对mystr的引用后,内存将被释放。在这之后,垃圾已经被捡走了,所以<code>gc.collect()</code>之后就没有进一步的减少了。在</p>
<p>下一个版本使用函数为字符串创建不同的命名空间。在</p>
<pre><code>import gc
import sys
import time
from pympler import tracker
def loopy():
mylist = []
for x in xrange(100):
mystr = ' ' * int(size_kb) * 1024
mydict = {x: mystr}
mylist.append(mydict)
return mylist
tr = tracker.SummaryTracker()
print 'begin:'
tr.print_diff()
size_kb = sys.argv[1]
mylist = loopy()
print 'after for loop:'
tr.print_diff()
del mylist
print 'after deleting stuff:'
tr.print_diff()
collected = gc.collect()
print 'after garbage collection (collected: %d):' % collected
tr.print_diff()
time.sleep(2)
print 'took a short nap after all that work:'
tr.print_diff()
mylist = []
print 'create an empty list for some reason:'
tr.print_diff()
</code></pre>
<p>最后,这个版本的输出:</p>
<pre><code>$ python mem_test_2.py 256
begin:
types | # objects | total size
======================= | =========== | =============
list | 958 | 97.53 KB
str | 952 | 53.70 KB
int | 118 | 2.77 KB
wrapper_descriptor | 8 | 640 B
weakref | 3 | 264 B
member_descriptor | 2 | 144 B
getset_descriptor | 2 | 144 B
function (store_info) | 1 | 120 B
cell | 2 | 112 B
instancemethod | -1 | -80 B
_sre.SRE_Pattern | -2 | -176 B
tuple | -1 | -216 B
dict | 2 | -1744 B
after for loop:
types | # objects | total size
======= | =========== | ============
list | 2 | 1016 B
str | 2 | 97 B
int | 1 | 24 B
after deleting stuff:
types | # objects | total size
======= | =========== | ============
list | -1 | -920 B
after garbage collection (collected: 0):
types | # objects | total size
======= | =========== | ============
took a short nap after all that work:
types | # objects | total size
======= | =========== | ============
create an empty list for some reason:
types | # objects | total size
======= | =========== | ============
list | 1 | 72 B
</code></pre>
<p>现在,我们不必清理str,我想这个例子说明了为什么使用函数是个好主意。在一个命名空间中有一个大块的地方生成代码实际上是在阻止垃圾回收器完成它的工作。它不会进入你的房子,并开始假设这些东西是垃圾:)它必须知道这些东西是安全的收集。在</p>
<p>顺便说一句,埃文·琼斯很有趣</p>