在Python中使用小于128KB的字符串时内存泄漏?
原始标题:在Python中打开小于128KB的文件时内存泄漏?
原始问题
我在运行我的Python脚本时发现了一个我认为是内存泄漏的问题。以下是我的脚本:
import sys
import time
class MyObj(object):
def __init__(self, filename):
with open(filename) as f:
self.att = f.read()
def myfunc(filename):
mylist = [MyObj(filename) for x in xrange(100)]
len(mylist)
return []
def main():
filename = sys.argv[1]
myfunc(filename)
time.sleep(3600)
if __name__ == '__main__':
main()
主函数调用了myfunc()
,这个函数创建了一个包含100个对象的列表,每个对象都打开并读取一个文件。在从myfunc()
返回后,我本以为这100个对象的内存和读取文件的内存会被释放,因为它们不再被引用。然而,当我使用ps
命令检查内存使用情况时,Python进程的内存使用量比注释掉第12和第13行的脚本多了大约10,000 KB。
奇怪的是,这种内存泄漏(如果真的是的话)似乎只发生在小于128KB的文件上。我创建了一个bash脚本来运行这个脚本,文件大小从1KB到200KB不等,发现当文件大小达到128KB时,内存的增加停止了。以下是bash脚本:
#!/bin/bash
echo "PID RSS S TTY TIME COMMAND" > output.txt
for i in `seq 1 200`;
do
python debug_memory.py "data/stuff_${i}K.txt" &
pid=$!
sleep 0.1
ps -e -O rss | grep $pid | grep -v grep >> output.txt
kill $pid
done
这是bash脚本的输出:
PID RSS S TTY TIME COMMAND
28471 5552 S pts/16 00:00:00 python debug_memory.py data/stuff_1K.txt
28477 5656 S pts/16 00:00:00 python debug_memory.py data/stuff_2K.txt
28483 5756 S pts/16 00:00:00 python debug_memory.py data/stuff_3K.txt
28488 5852 S pts/16 00:00:00 python debug_memory.py data/stuff_4K.txt
28494 5952 S pts/16 00:00:00 python debug_memory.py data/stuff_5K.txt
28499 6052 S pts/16 00:00:00 python debug_memory.py data/stuff_6K.txt
28505 6156 S pts/16 00:00:00 python debug_memory.py data/stuff_7K.txt
28511 6256 S pts/16 00:00:00 python debug_memory.py data/stuff_8K.txt
28516 6356 S pts/16 00:00:00 python debug_memory.py data/stuff_9K.txt
28522 6452 S pts/16 00:00:00 python debug_memory.py data/stuff_10K.txt
28527 6552 S pts/16 00:00:00 python debug_memory.py data/stuff_11K.txt
28533 6656 S pts/16 00:00:00 python debug_memory.py data/stuff_12K.txt
28539 6756 S pts/16 00:00:00 python debug_memory.py data/stuff_13K.txt
28544 6852 S pts/16 00:00:00 python debug_memory.py data/stuff_14K.txt
28550 6952 S pts/16 00:00:00 python debug_memory.py data/stuff_15K.txt
28555 7056 S pts/16 00:00:00 python debug_memory.py data/stuff_16K.txt
28561 7156 S pts/16 00:00:00 python debug_memory.py data/stuff_17K.txt
28567 7252 S pts/16 00:00:00 python debug_memory.py data/stuff_18K.txt
28572 7356 S pts/16 00:00:00 python debug_memory.py data/stuff_19K.txt
28578 7452 S pts/16 00:00:00 python debug_memory.py data/stuff_20K.txt
28584 7556 S pts/16 00:00:00 python debug_memory.py data/stuff_21K.txt
28589 7652 S pts/16 00:00:00 python debug_memory.py data/stuff_22K.txt
28595 7756 S pts/16 00:00:00 python debug_memory.py data/stuff_23K.txt
28600 7852 S pts/16 00:00:00 python debug_memory.py data/stuff_24K.txt
28606 7952 S pts/16 00:00:00 python debug_memory.py data/stuff_25K.txt
28612 8052 S pts/16 00:00:00 python debug_memory.py data/stuff_26K.txt
28617 8152 S pts/16 00:00:00 python debug_memory.py data/stuff_27K.txt
28623 8252 S pts/16 00:00:00 python debug_memory.py data/stuff_28K.txt
28629 8356 S pts/16 00:00:00 python debug_memory.py data/stuff_29K.txt
28634 8452 S pts/16 00:00:00 python debug_memory.py data/stuff_30K.txt
28640 8556 S pts/16 00:00:00 python debug_memory.py data/stuff_31K.txt
28645 8656 S pts/16 00:00:00 python debug_memory.py data/stuff_32K.txt
28651 8756 S pts/16 00:00:00 python debug_memory.py data/stuff_33K.txt
28657 8856 S pts/16 00:00:00 python debug_memory.py data/stuff_34K.txt
28662 8956 S pts/16 00:00:00 python debug_memory.py data/stuff_35K.txt
28668 9056 S pts/16 00:00:00 python debug_memory.py data/stuff_36K.txt
28674 9156 S pts/16 00:00:00 python debug_memory.py data/stuff_37K.txt
28679 9256 S pts/16 00:00:00 python debug_memory.py data/stuff_38K.txt
28685 9352 S pts/16 00:00:00 python debug_memory.py data/stuff_39K.txt
28691 9452 S pts/16 00:00:00 python debug_memory.py data/stuff_40K.txt
28696 9552 S pts/16 00:00:00 python debug_memory.py data/stuff_41K.txt
28702 9656 S pts/16 00:00:00 python debug_memory.py data/stuff_42K.txt
28707 9756 S pts/16 00:00:00 python debug_memory.py data/stuff_43K.txt
28713 9852 S pts/16 00:00:00 python debug_memory.py data/stuff_44K.txt
28719 9952 S pts/16 00:00:00 python debug_memory.py data/stuff_45K.txt
28724 10052 S pts/16 00:00:00 python debug_memory.py data/stuff_46K.txt
28730 10156 S pts/16 00:00:00 python debug_memory.py data/stuff_47K.txt
28739 10256 S pts/16 00:00:00 python debug_memory.py data/stuff_48K.txt
28746 10352 S pts/16 00:00:00 python debug_memory.py data/stuff_49K.txt
28752 10452 S pts/16 00:00:00 python debug_memory.py data/stuff_50K.txt
28757 10556 S pts/16 00:00:00 python debug_memory.py data/stuff_51K.txt
28763 10656 S pts/16 00:00:00 python debug_memory.py data/stuff_52K.txt
28769 10752 S pts/16 00:00:00 python debug_memory.py data/stuff_53K.txt
28774 10852 S pts/16 00:00:00 python debug_memory.py data/stuff_54K.txt
28780 10952 S pts/16 00:00:00 python debug_memory.py data/stuff_55K.txt
28786 11052 S pts/16 00:00:00 python debug_memory.py data/stuff_56K.txt
28791 11152 S pts/16 00:00:00 python debug_memory.py data/stuff_57K.txt
28797 11256 S pts/16 00:00:00 python debug_memory.py data/stuff_58K.txt
28802 11356 S pts/16 00:00:00 python debug_memory.py data/stuff_59K.txt
28808 11452 S pts/16 00:00:00 python debug_memory.py data/stuff_60K.txt
28814 11556 S pts/16 00:00:00 python debug_memory.py data/stuff_61K.txt
28819 11656 S pts/16 00:00:00 python debug_memory.py data/stuff_62K.txt
28825 11752 S pts/16 00:00:00 python debug_memory.py data/stuff_63K.txt
28831 11852 S pts/16 00:00:00 python debug_memory.py data/stuff_64K.txt
28836 11956 S pts/16 00:00:00 python debug_memory.py data/stuff_65K.txt
28842 12052 S pts/16 00:00:00 python debug_memory.py data/stuff_66K.txt
28847 12152 S pts/16 00:00:00 python debug_memory.py data/stuff_67K.txt
28853 12256 S pts/16 00:00:00 python debug_memory.py data/stuff_68K.txt
28859 12356 S pts/16 00:00:00 python debug_memory.py data/stuff_69K.txt
28864 12452 S pts/16 00:00:00 python debug_memory.py data/stuff_70K.txt
28871 12556 S pts/16 00:00:00 python debug_memory.py data/stuff_71K.txt
28877 12652 S pts/16 00:00:00 python debug_memory.py data/stuff_72K.txt
28883 12756 S pts/16 00:00:00 python debug_memory.py data/stuff_73K.txt
28889 12856 S pts/16 00:00:00 python debug_memory.py data/stuff_74K.txt
28894 12952 S pts/16 00:00:00 python debug_memory.py data/stuff_75K.txt
28900 13056 S pts/16 00:00:00 python debug_memory.py data/stuff_76K.txt
28906 13156 S pts/16 00:00:00 python debug_memory.py data/stuff_77K.txt
28911 13256 S pts/16 00:00:00 python debug_memory.py data/stuff_78K.txt
28917 13352 S pts/16 00:00:00 python debug_memory.py data/stuff_79K.txt
28922 13452 S pts/16 00:00:00 python debug_memory.py data/stuff_80K.txt
28928 13556 S pts/16 00:00:00 python debug_memory.py data/stuff_81K.txt
28934 13652 S pts/16 00:00:00 python debug_memory.py data/stuff_82K.txt
28939 13752 S pts/16 00:00:00 python debug_memory.py data/stuff_83K.txt
28945 13852 S pts/16 00:00:00 python debug_memory.py data/stuff_84K.txt
28951 13952 S pts/16 00:00:00 python debug_memory.py data/stuff_85K.txt
28956 14052 S pts/16 00:00:00 python debug_memory.py data/stuff_86K.txt
28962 14152 S pts/16 00:00:00 python debug_memory.py data/stuff_87K.txt
28967 14256 S pts/16 00:00:00 python debug_memory.py data/stuff_88K.txt
28973 14352 S pts/16 00:00:00 python debug_memory.py data/stuff_89K.txt
28979 14456 S pts/16 00:00:00 python debug_memory.py data/stuff_90K.txt
28984 14552 S pts/16 00:00:00 python debug_memory.py data/stuff_91K.txt
28990 14652 S pts/16 00:00:00 python debug_memory.py data/stuff_92K.txt
28996 14756 S pts/16 00:00:00 python debug_memory.py data/stuff_93K.txt
29001 14852 S pts/16 00:00:00 python debug_memory.py data/stuff_94K.txt
29007 14956 S pts/16 00:00:00 python debug_memory.py data/stuff_95K.txt
29012 15052 S pts/16 00:00:00 python debug_memory.py data/stuff_96K.txt
29018 15156 S pts/16 00:00:00 python debug_memory.py data/stuff_97K.txt
29024 15252 S pts/16 00:00:00 python debug_memory.py data/stuff_98K.txt
29029 15360 S pts/16 00:00:00 python debug_memory.py data/stuff_99K.txt
29035 15456 S pts/16 00:00:00 python debug_memory.py data/stuff_100K.txt
29040 15556 S pts/16 00:00:00 python debug_memory.py data/stuff_101K.txt
29046 15652 S pts/16 00:00:00 python debug_memory.py data/stuff_102K.txt
29052 15756 S pts/16 00:00:00 python debug_memory.py data/stuff_103K.txt
29057 15852 S pts/16 00:00:00 python debug_memory.py data/stuff_104K.txt
29063 15952 S pts/16 00:00:00 python debug_memory.py data/stuff_105K.txt
29069 16056 S pts/16 00:00:00 python debug_memory.py data/stuff_106K.txt
29074 16152 S pts/16 00:00:00 python debug_memory.py data/stuff_107K.txt
29080 16256 S pts/16 00:00:00 python debug_memory.py data/stuff_108K.txt
29085 16356 S pts/16 00:00:00 python debug_memory.py data/stuff_109K.txt
29091 16452 S pts/16 00:00:00 python debug_memory.py data/stuff_110K.txt
29097 16552 S pts/16 00:00:00 python debug_memory.py data/stuff_111K.txt
29102 16652 S pts/16 00:00:00 python debug_memory.py data/stuff_112K.txt
29108 16756 S pts/16 00:00:00 python debug_memory.py data/stuff_113K.txt
29113 16852 S pts/16 00:00:00 python debug_memory.py data/stuff_114K.txt
29119 16952 S pts/16 00:00:00 python debug_memory.py data/stuff_115K.txt
29125 17056 S pts/16 00:00:00 python debug_memory.py data/stuff_116K.txt
29130 17156 S pts/16 00:00:00 python debug_memory.py data/stuff_117K.txt
29136 17256 S pts/16 00:00:00 python debug_memory.py data/stuff_118K.txt
29141 17356 S pts/16 00:00:00 python debug_memory.py data/stuff_119K.txt
29147 17452 S pts/16 00:00:00 python debug_memory.py data/stuff_120K.txt
29153 17556 S pts/16 00:00:00 python debug_memory.py data/stuff_121K.txt
29158 17656 S pts/16 00:00:00 python debug_memory.py data/stuff_122K.txt
29164 17756 S pts/16 00:00:00 python debug_memory.py data/stuff_123K.txt
29170 17856 S pts/16 00:00:00 python debug_memory.py data/stuff_124K.txt
29175 17952 S pts/16 00:00:00 python debug_memory.py data/stuff_125K.txt
29181 18056 S pts/16 00:00:00 python debug_memory.py data/stuff_126K.txt
29186 18152 S pts/16 00:00:00 python debug_memory.py data/stuff_127K.txt
29192 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_128K.txt
29198 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_129K.txt
29203 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_130K.txt
29209 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_131K.txt
29215 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_132K.txt
29220 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_133K.txt
29226 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_134K.txt
29231 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_135K.txt
29237 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_136K.txt
29243 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_137K.txt
29248 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_138K.txt
29254 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_139K.txt
29260 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_140K.txt
29265 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_141K.txt
29271 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_142K.txt
29276 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_143K.txt
29282 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_144K.txt
29288 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_145K.txt
29293 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_146K.txt
29299 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_147K.txt
29305 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_148K.txt
29310 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_149K.txt
29316 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_150K.txt
29321 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_151K.txt
29327 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_152K.txt
29333 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_153K.txt
29338 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_154K.txt
29344 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_155K.txt
29349 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_156K.txt
29355 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_157K.txt
29361 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_158K.txt
29366 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_159K.txt
29372 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_160K.txt
29378 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_161K.txt
29383 5460 S pts/16 00:00:00 python debug_memory.py data/stuff_162K.txt
29389 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_163K.txt
29394 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_164K.txt
29400 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_165K.txt
29406 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_166K.txt
29411 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_167K.txt
29417 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_168K.txt
29423 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_169K.txt
29428 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_170K.txt
29434 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_171K.txt
29439 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_172K.txt
29445 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_173K.txt
29451 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_174K.txt
29456 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_175K.txt
29463 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_176K.txt
29483 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_177K.txt
29489 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_178K.txt
29496 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_179K.txt
29501 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_180K.txt
29507 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_181K.txt
29512 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_182K.txt
29518 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_183K.txt
29524 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_184K.txt
29529 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_185K.txt
29535 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_186K.txt
29541 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_187K.txt
29546 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_188K.txt
29552 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_189K.txt
29557 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_190K.txt
29563 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_191K.txt
29569 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_192K.txt
29574 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_193K.txt
29580 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_194K.txt
29586 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_195K.txt
29591 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_196K.txt
29597 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_197K.txt
29602 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_198K.txt
29608 5456 S pts/16 00:00:00 python debug_memory.py data/stuff_199K.txt
29614 5452 S pts/16 00:00:00 python debug_memory.py data/stuff_200K.txt
有人能解释一下发生了什么吗?为什么在使用小于128KB的文件时,我会看到内存使用量的增加?
我的完整测试环境在这里: https://github.com/saltycrane/debugging-python-memory-usage/tree/50f73358c7a84a504333ce9c4071b0f3537bbc0f
我在Ubuntu 12.04上运行Python 2.7.3。
更新 1
这个问题并不仅限于处理小于128K的文件。我在将对象属性设置为与从文件中读取的值相同的大小时也得到了相同的结果。以下是更新后的代码:
import sys
import time
class MyObj(object):
def __init__(self, size_kb):
self.att = ' ' * int(size_kb) * 1024
def myfunc(size_kb):
mylist = [MyObj(size_kb) for x in xrange(100)]
len(mylist)
return []
def main():
size_kb = sys.argv[1]
myfunc(size_kb)
time.sleep(3600)
if __name__ == '__main__':
main()
运行这个脚本也得到了类似的结果。更新后的测试环境在这里: https://github.com/saltycrane/debugging-python-memory-usage/tree/59b7ff61134dfc11c4195e9201b2c1728ed4fcce
更新 2
我进一步简化了我的测试脚本:1. 移除了类,简单地创建了一个字符串列表 2. 移除了myfunc()
,直接使用del
来删除mylist
对象
import sys
import time
def main():
size_kb = sys.argv[1]
mylist = []
for x in xrange(100):
mystr = ' ' * int(size_kb) * 1024
mylist.append(mystr)
del mylist
time.sleep(3600)
if __name__ == '__main__':
main()
我简化后的脚本也得到了与原始脚本类似的结果。然而,如果我不创建一个单独的字符串变量,我就看不到内存的增加。以下是不会导致内存增加的脚本:
import sys
import time
def main():
size_kb = sys.argv[1]
mylist = []
for x in xrange(100):
mylist.append(' ' * int(size_kb) * 1024)
del mylist
time.sleep(3600)
if __name__ == '__main__':
main()
更新后的测试环境在这里: https://github.com/saltycrane/debugging-python-memory-usage/tree/423ca6a50dccbe32572a9d0dea1068ddcb06663b
更多问题:
- 有没有其他人能重现我的结果?
- 通过
ps
看到的内存增加是正常的吗?
关于发生了什么的提示
我发现了一些关于“空闲列表”的有趣信息,这似乎与这个问题有关:
从最后一个链接中:
为了加快内存分配(和重用),Python使用了一些用于小对象的列表。每个列表将包含相似大小的对象。
确实:如果一个项目(大小为x)被释放(由于缺少引用而被释放),它的位置不会返回到Python的全局内存池(甚至更不会返回给系统),而只是标记为可用并添加到大小为x的空闲列表中。
如果小对象的内存从未被释放,那么不可避免的结论是,就像金鱼一样,这些小对象列表只会不断增长,而不会缩小,而你的应用程序的内存占用则由在任何给定时刻分配的小对象的最大数量主导。
更新 3
我在更新2中过于简化了代码。在脚本末尾添加del mystr
这一行释放了内存。
(见:https://github.com/saltycrane/debugging-python-memory-usage/blob/dd058e4774802cae7cbfca520fb835ea46b645e8/debug_memory_leaks.py)
我更新了脚本,使其足够复杂以展示这个问题。以下代码中仍然存在该问题。 最新的代码/环境在这里: https://github.com/saltycrane/debugging-python-memory-usage/tree/fc0c8ce9ba621cb86b6abb93adf1b297a7c0230b
import gc
import sys
import time
def main():
size_kb = sys.argv[1]
mylist = []
for x in xrange(100):
mystr = ' ' * int(size_kb) * 1024
mydict = {'mykey': mystr}
mylist.append(mydict)
del mystr
del mydict
del mylist
gc.collect()
time.sleep(3600)
if __name__ == '__main__':
main()
我还在其他环境中运行了这个脚本。奇怪的是,在一个干净的虚拟环境中运行时,内存下降发生在260KB而不是128KB。见https://github.com/saltycrane/debugging-python-memory-usage/tree/52fbd5d57ff45affdcd70623ddb74fa1f1ffbbc2
环境:
- Ubuntu 12.04 64位,系统Python 2.7.3:原始运行
- Ubuntu 12.04 64位,从源代码编译的Python 3.3.0:类似结果
- Scientific Linux 6 64位,Python 2.6.6:类似结果
- Ubuntu 12.04 64位,来自虚拟环境的Python 2.7.3:内存下降发生在260KB而不是128KB
更多参考:
- http://revista.python.org.ar/2/en/html/memory-fragmentation.html
- http://www.evanjones.ca/python-memory.html
- http://mail.python.org/pipermail/python-dev/2004-October/049480.html(注意:这是2004年的内容)
- http://mail.python.org/pipermail/python-dev/2006-March/061991.html
- http://www.evanjones.ca/memoryallocator/
- http://www.evanjones.ca/memory-allocator.pdf
- http://hg.python.org/releasing/2.7.3/file/7bb96963d067/Objects/obmalloc.c
在阅读了一些这些内容后,我看到提到了256KB的“区域大小”。这可能与此有关吗?
更新 4(大部分解决)
schlenk揭示了内存使用量在128KB时下降的原因。
128KB是“内存分配函数”(malloc?)使用mmap而不是通过sbrk增加程序断点的点。
有趣的是,这个阈值可以通过环境变量进行更改。
我进行了测试,将MALLOC_MMAP_THRESHOLD_
环境变量设置为不同的值,内存使用量的下降与该值相匹配。
结果见这里:
https://github.com/saltycrane/debugging-python-memory-usage/blob/97d93cd165a139a6b6f96720de63a92561dd2f05/output_debug_memory_leaks.py.txt
我仍然想知道我的脚本在处理小于128KB的字符串值时是否会泄漏内存是预期的行为。
还有一些链接:
- mallopt(3) - Linux手册页(来自schlenk)
- Python内存管理和TCMalloc | Pushing the Web
- Re: 在python 2.7.1中设置x为None和del x不会释放内存(HPUX 11.23,ia64)« python-list « ActiveState List Archives
- 问题3526:SunOS和AIX上的自定义malloc实现 - Python跟踪器
- 使malloc中的mmap/brk阈值动态以提高性能
注意:根据最后两个链接,使用mmap而不是sbrk会影响性能(速度)。
2 个回答
我建议你看看垃圾回收的内容。可能是因为大文件更频繁地触发了垃圾回收,而小文件虽然被释放了,但总是保持在某个阈值上。具体来说,可以调用 gc.collect()
,然后对对象调用 gc.get_referrers()
,希望能找出是什么让这个实例一直存在。你可以查看这里的Python文档:
http://docs.python.org/2/library/gc.html?highlight=gc#gc.get_referrers
更新:
这个问题与垃圾回收、命名空间和引用计数有关。你发布的bash脚本只给出了垃圾回收器行为的一个比较狭窄的视角。试着扩大范围,你会看到某些范围内内存使用的模式。例如,可以把bash的for循环改成更大的范围,比如:seq 0 16 2056
。
你注意到如果使用 del mystr
,内存使用会减少,因为你移除了对它的任何引用。如果你把mystr变量限制在它自己的函数中,类似这样,也会得到类似的结果:
def loopy():
mylist = []
for x in xrange(100):
mystr = ' ' * int(size_kb) * 1024
mydict = {x: mystr}
mylist.append(mydict)
return mylist
与其使用bash脚本,我觉得使用内存分析工具会得到更有用的信息。这里有几个使用Pympler的例子。第一个版本和你在更新3中的代码类似:
import gc
import sys
import time
from pympler import tracker
tr = tracker.SummaryTracker()
print 'begin:'
tr.print_diff()
size_kb = sys.argv[1]
mylist = []
mydict = {}
print 'empty list & dict:'
tr.print_diff()
for x in xrange(100):
mystr = ' ' * int(size_kb) * 1024
mydict = {x: mystr}
mylist.append(mydict)
print 'after for loop:'
tr.print_diff()
del mystr
del mydict
del mylist
print 'after deleting stuff:'
tr.print_diff()
collected = gc.collect()
print 'after garbage collection (collected: %d):' % collected
tr.print_diff()
time.sleep(2)
print 'took a short nap after all that work:'
tr.print_diff()
mylist = []
print 'create an empty list for some reason:'
tr.print_diff()
输出结果是:
$ python mem_test.py 256
begin:
types | # objects | total size
======================= | =========== | =============
list | 957 | 97.44 KB
str | 951 | 53.65 KB
int | 118 | 2.77 KB
wrapper_descriptor | 8 | 640 B
weakref | 3 | 264 B
member_descriptor | 2 | 144 B
getset_descriptor | 2 | 144 B
function (store_info) | 1 | 120 B
cell | 2 | 112 B
instancemethod | -1 | -80 B
_sre.SRE_Pattern | -2 | -176 B
tuple | -1 | -216 B
dict | 2 | -1744 B
empty list & dict:
types | # objects | total size
======= | =========== | ============
list | 2 | 168 B
str | 2 | 97 B
int | 1 | 24 B
after for loop:
types | # objects | total size
======= | =========== | ============
str | 1 | 256.04 KB
list | 0 | 848 B
after deleting stuff:
types | # objects | total size
======= | =========== | ===============
list | -1 | -920 B
str | -1 | -262181 B
after garbage collection (collected: 0):
types | # objects | total size
======= | =========== | ============
took a short nap after all that work:
types | # objects | total size
======= | =========== | ============
create an empty list for some reason:
types | # objects | total size
======= | =========== | ============
list | 1 | 72 B
注意到在for循环之后,str类的总大小是256 KB,基本上和我传给它的参数一样。在明确移除对mystr的引用后(使用 del mystr
),内存被释放了。之后,垃圾已经被回收,所以在调用 gc.collect()
后不会有进一步的减少。
下一个版本使用一个函数来创建一个不同的命名空间来存储字符串。
import gc
import sys
import time
from pympler import tracker
def loopy():
mylist = []
for x in xrange(100):
mystr = ' ' * int(size_kb) * 1024
mydict = {x: mystr}
mylist.append(mydict)
return mylist
tr = tracker.SummaryTracker()
print 'begin:'
tr.print_diff()
size_kb = sys.argv[1]
mylist = loopy()
print 'after for loop:'
tr.print_diff()
del mylist
print 'after deleting stuff:'
tr.print_diff()
collected = gc.collect()
print 'after garbage collection (collected: %d):' % collected
tr.print_diff()
time.sleep(2)
print 'took a short nap after all that work:'
tr.print_diff()
mylist = []
print 'create an empty list for some reason:'
tr.print_diff()
最后这个版本的输出是:
$ python mem_test_2.py 256
begin:
types | # objects | total size
======================= | =========== | =============
list | 958 | 97.53 KB
str | 952 | 53.70 KB
int | 118 | 2.77 KB
wrapper_descriptor | 8 | 640 B
weakref | 3 | 264 B
member_descriptor | 2 | 144 B
getset_descriptor | 2 | 144 B
function (store_info) | 1 | 120 B
cell | 2 | 112 B
instancemethod | -1 | -80 B
_sre.SRE_Pattern | -2 | -176 B
tuple | -1 | -216 B
dict | 2 | -1744 B
after for loop:
types | # objects | total size
======= | =========== | ============
list | 2 | 1016 B
str | 2 | 97 B
int | 1 | 24 B
after deleting stuff:
types | # objects | total size
======= | =========== | ============
list | -1 | -920 B
after garbage collection (collected: 0):
types | # objects | total size
======= | =========== | ============
took a short nap after all that work:
types | # objects | total size
======= | =========== | ============
create an empty list for some reason:
types | # objects | total size
======= | =========== | ============
list | 1 | 72 B
现在,我们不需要清理str,我认为这个例子展示了为什么使用函数是个好主意。生成代码时,如果所有内容都在一个大的命名空间中,会阻碍垃圾回收器的工作。它不会主动进入你的家里假设某些东西是垃圾 :) 它必须知道哪些东西是安全的,可以被回收。
顺便提一下,Evan Jones的链接非常有趣。
你可能遇到了Linux内存分配器的默认行为。
简单来说,Linux有两种分配内存的方式:sbrk()用于小块内存,而mmap()用于大块内存。使用sbrk()分配的内存块不容易被系统回收,而使用mmap()分配的内存块则可以很容易地被回收(只需要解除映射即可)。
所以,如果你分配的内存块大于libc中的malloc()决定切换到mmap()的那个值,你就会看到这种现象。可以查看mallopt()这个调用,特别是MMAP_THRESHOLD这个参数(http://man7.org/linux/man-pages/man3/mallopt.3.html)。
更新
关于你额外的问题:是的,如果内存分配器像Linux上的libc那样工作,确实会出现内存泄漏的情况。如果你使用的是Windows的低碎片堆(LowFragmentationHeap),那么可能就不会出现泄漏,AIX也是类似的,具体取决于配置了哪个malloc。也许其他一些分配器(比如tcmalloc等)也能解决这些问题。sbrk()非常快,但在内存碎片方面有问题。CPython对此无能为力,因为它没有压缩垃圾回收器,只有简单的引用计数。
Python提供了一些方法来减少缓冲区的分配,比如可以看看这里的博客文章:http://eli.thegreenplace.net/2011/11/28/less-copies-in-python-with-the-buffer-protocol-and-memoryviews/