Python subprocess.Popen "OSError: [Errno 12] 无法分配内存
注意:这个问题最初是在 这里 提出的,但赏金时间到期时并没有找到合适的答案。我重新提问这个问题,并包含了原问题中的所有细节。
一个 Python 脚本每 60 秒运行一组类函数,使用的是 sched 模块:
# sc is a sched.scheduler instance
sc.enter(60, 1, self.doChecks, (sc, False))
这个脚本作为一个守护进程在运行,使用的代码可以在 这里 找到。
在 doChecks 方法中,有几个类方法使用 subprocess 模块来调用系统函数,以获取系统统计信息:
ps = subprocess.Popen(['ps', 'aux'], stdout=subprocess.PIPE).communicate()[0]
这个脚本在运行一段时间后会崩溃,并出现以下错误:
File "/home/admin/sd-agent/checks.py", line 436, in getProcesses
File "/usr/lib/python2.4/subprocess.py", line 533, in __init__
File "/usr/lib/python2.4/subprocess.py", line 835, in _get_handles
OSError: [Errno 12] Cannot allocate memory
脚本崩溃后,服务器上运行 free -m 的输出是:
$ free -m
total used free shared buffers cached
Mem: 894 345 549 0 0 0
-/+ buffers/cache: 345 549
Swap: 0 0 0
服务器运行的是 CentOS 5.3。我在自己的 CentOS 系统上无法重现这个问题,也没有其他用户报告同样的问题。
我尝试了很多方法来调试这个问题,正如原问题中建议的那样:
在 Popen 调用前后记录 free -m 的输出。内存使用情况没有显著变化,也就是说,脚本运行时内存并没有逐渐被占用。
我在 Popen 调用中添加了 close_fds=True,但这没有任何效果——脚本仍然以相同的错误崩溃。这个建议可以在 这里 和 这里 找到。
我检查了 rlimits,RLIMIT_DATA 和 RLIMIT_AS 都显示为 (-1, -1),这个建议可以在 这里 找到。
一篇文章 提到没有交换空间可能是原因,但根据网络主机的说法,实际上是可以按需使用交换空间,这也被认为是一个错误的原因 这里。
进程被关闭是因为使用 .communicate() 的行为,这在 Python 源代码和评论中都有说明 这里。
所有的检查可以在 GitHub 这里 找到,getProcesses 函数从第 442 行定义。这个函数是由第 520 行的 doChecks() 调用的。
脚本在崩溃前用 strace 运行,输出如下:
recv(4, "Total Accesses: 516662\nTotal kBy"..., 234, 0) = 234
gettimeofday({1250893252, 887805}, NULL) = 0
write(3, "2009-08-21 17:20:52,887 - checks"..., 91) = 91
gettimeofday({1250893252, 888362}, NULL) = 0
write(3, "2009-08-21 17:20:52,888 - checks"..., 74) = 74
gettimeofday({1250893252, 888897}, NULL) = 0
write(3, "2009-08-21 17:20:52,888 - checks"..., 67) = 67
gettimeofday({1250893252, 889184}, NULL) = 0
write(3, "2009-08-21 17:20:52,889 - checks"..., 81) = 81
close(4) = 0
gettimeofday({1250893252, 889591}, NULL) = 0
write(3, "2009-08-21 17:20:52,889 - checks"..., 63) = 63
pipe([4, 5]) = 0
pipe([6, 7]) = 0
fcntl64(7, F_GETFD) = 0
fcntl64(7, F_SETFD, FD_CLOEXEC) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7f12708) = -1 ENOMEM (Cannot allocate memory)
write(2, "Traceback (most recent call last"..., 35) = 35
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, " File \"/usr/bin/sd-agent/agent."..., 52) = 52
open("/home/admin/sd-agent/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, " File \"/home/admin/sd-agent/dae"..., 60) = 60
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, " File \"/usr/bin/sd-agent/agent."..., 54) = 54
open("/usr/lib/python2.4/sched.py", O_RDONLY|O_LARGEFILE) = 8
write(2, " File \"/usr/lib/python2.4/sched"..., 55) = 55
fstat64(8, {st_mode=S_IFREG|0644, st_size=4054, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "\"\"\"A generally useful event sche"..., 4096) = 4054
write(2, " ", 4) = 4
write(2, "void = action(*argument)\n", 25) = 25
close(8) = 0
munmap(0xb7d28000, 4096) = 0
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, " File \"/usr/bin/sd-agent/checks"..., 60) = 60
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, " File \"/usr/bin/sd-agent/checks"..., 64) = 64
open("/usr/lib/python2.4/subprocess.py", O_RDONLY|O_LARGEFILE) = 8
write(2, " File \"/usr/lib/python2.4/subpr"..., 65) = 65
fstat64(8, {st_mode=S_IFREG|0644, st_size=39931, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "# subprocess - Subprocesses with"..., 4096) = 4096
read(8, "lso, the newlines attribute of t"..., 4096) = 4096
read(8, "code < 0:\n print >>sys.st"..., 4096) = 4096
read(8, "alse does not exist on 2.2.0\ntry"..., 4096) = 4096
read(8, " p2cread\n # c2pread <-"..., 4096) = 4096
write(2, " ", 4) = 4
write(2, "errread, errwrite)\n", 19) = 19
close(8) = 0
munmap(0xb7d28000, 4096) = 0
open("/usr/lib/python2.4/subprocess.py", O_RDONLY|O_LARGEFILE) = 8
write(2, " File \"/usr/lib/python2.4/subpr"..., 71) = 71
fstat64(8, {st_mode=S_IFREG|0644, st_size=39931, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "# subprocess - Subprocesses with"..., 4096) = 4096
read(8, "lso, the newlines attribute of t"..., 4096) = 4096
read(8, "code < 0:\n print >>sys.st"..., 4096) = 4096
read(8, "alse does not exist on 2.2.0\ntry"..., 4096) = 4096
read(8, " p2cread\n # c2pread <-"..., 4096) = 4096
read(8, "table(self, handle):\n "..., 4096) = 4096
read(8, "rrno using _sys_errlist (or siml"..., 4096) = 4096
read(8, " p2cwrite = None, None\n "..., 4096) = 4096
write(2, " ", 4) = 4
write(2, "self.pid = os.fork()\n", 21) = 21
close(8) = 0
munmap(0xb7d28000, 4096) = 0
write(2, "OSError", 7) = 7
write(2, ": ", 2) = 2
write(2, "[Errno 12] Cannot allocate memor"..., 33) = 33
write(2, "\n", 1) = 1
unlink("/var/run/sd-agent.pid") = 0
close(3) = 0
munmap(0xb7e0d000, 4096) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x589978}, {0xb89a60, [], SA_RESTORER, 0x589978}, 8) = 0
brk(0xa022000) = 0xa022000
exit_group(1) = ?
8 个回答
如果你想简单解决这个问题,可以试试下面的代码:
echo 1 > /proc/sys/vm/overcommit_memory
前提是你确定你的系统有足够的内存。你可以查看一下Linux的超额分配策略。
从free -m
的输出结果来看,似乎你的系统没有可用的交换内存。我不太确定在Linux系统中,交换内存是否总是会自动根据需要提供,但我之前也遇到过同样的问题,这里的一些回答对我并没有帮助。不过,添加一些交换内存解决了我的问题,所以我想把我在Ubuntu 12.04上添加1GB交换内存的方法分享出来,希望能帮助到其他遇到相同问题的人。
你可以先检查一下是否启用了交换内存。
$sudo swapon -s
如果结果是空的,那就说明你没有启用任何交换内存。要添加1GB的交换内存,可以按照以下步骤进行:
$sudo dd if=/dev/zero of=/swapfile bs=1024 count=1024k
$sudo mkswap /swapfile
$sudo swapon /swapfile
为了让交换内存在重启后依然有效,需要在fstab
文件中添加以下内容。
$sudo vim /etc/fstab
/swapfile none swap sw 0 0
更多信息和详细来源可以在这里找到。
一般来说,在普通的内核中,fork
或clone
失败并出现ENOMEM
的情况,主要是因为两种原因:要么是真正的内存不足(比如dup_mm
、dup_task_struct
、alloc_pid
、mpol_dup
、mm_init
等函数出错),要么是因为security_vm_enough_memory_mm
在执行overcommit policy
时失败了。
首先,你需要查看在尝试fork
时,失败的进程的虚拟内存大小,然后和可用的内存(包括物理内存和交换内存)进行比较,看看是否符合overcommit policy
的要求(把数字代入计算一下)。
在你的具体情况下,注意到Virtuozzo在overcommit enforcement
中有额外的检查。此外,我不太确定你在容器内部对swap
和overcommit configuration
的控制有多少(这会影响执行结果)。
现在,要继续前进,我认为你有两个选择:
- 切换到更大的实例,或者
- 花一些时间在
更有效地控制脚本的内存使用
上。
注意,如果最后发现问题不在你,而是在同一服务器上其他实例中的某个家伙在乱搞,那么你所做的努力可能就白费了。
关于内存,我们已经知道subprocess.Popen
在后台使用fork
/clone
,这意味着每次你调用它时,实际上是在请求和Python已经占用的内存一样多的内存,也就是几百MB,结果却是为了执行一个只有10KB的可执行文件,比如free
或ps
。在不利的overcommit policy
下,你很快就会看到ENOMEM
。
替代fork
的方式有vfork
和posix_spawn
,它们没有父进程页表等复制的问题。但如果你不想重写subprocess.Popen
的部分代码,可以考虑在脚本开始时只调用一次subprocess.Popen
(此时Python的内存占用最小),然后启动一个shell脚本,让它在循环中运行free
/ps
/sleep
等命令,和你的脚本并行运行;你可以轮询这个脚本的输出,或者同步读取,可能的话可以用一个单独的线程来处理异步任务——在Python中处理数据,但把fork
的工作留给子进程。
不过,在你的具体情况下,你可以完全跳过调用ps
和free
;这些信息可以直接通过procfs
在Python中获取,无论你是自己访问还是通过现有的库或包
。如果你只运行ps
和free
这两个工具,那么你可以完全不使用subprocess.Popen
。
最后,无论你对subprocess.Popen
做什么,如果你的脚本存在内存泄漏,最终还是会遇到问题。要注意这一点,并且检查内存泄漏。