python脚本的多个实例在中断后仍处于活动状态的奇怪行为

2024-04-23 16:19:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个复杂的python脚本。在一个循环中,我用多重处理调用一个函数,在这个函数中,我用子进程popen调用一个外部程序(pdfinfo)

我的程序运行了一段时间,我可以看到VIRT内存稳步增加(使用top命令),直到系统内存耗尽,并显示以下消息:

Traceback (most recent call last):
  File "classify_pdf.py", line 603, in <module>
    preprocessing_list[loop] = da.get_preprocessing_data(batch_files, metadata, cores)
  File "/home/student/.../src/data.py", line 87, in get_preprocessing_data
    properties = fp.pre_extract_pdf_properties(batch_files, cores)
  File "/home/student/.../src/features/pdf_properties.py", line 73, in pre_extract_pdf_properties
    pool = Pool(num_cores)
  File "/usr/lib/python3.5/multiprocessing/context.py", line 118, in Pool
    context=self.get_context())
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 168, in __init__
    self._repopulate_pool()
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 233, in _repopulate_pool
    w.start()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 67, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

在用Crtl-C中断进程之后,仍然有许多python进程仍像这样运行(我用ps aux | grep ptyhon展示了它们)。数千个甚至当我关闭与服务器的会话并重新登录时,它们仍然存在

user1+  53872  0.0  0.0 5444552    0 ?        S    Aug29   0:00 python classify_pdf.py -fp /data/allfiles/ -repo
user1+  53873  0.0  0.0 5444552    0 ?        S    Aug29   0:00 python classify_pdf.py -fp /data/allfiles/ -repo
user1+  53876  0.0  0.0 5444552    0 ?        S    Aug29   0:00 python classify_pdf.py -fp /data/allfiles/ -repo

但是为什么在我中断脚本之后还有那么多进程仍然存在?它是否与在循环中使用多进程和子进程有关?popen的fork是否正在创建其他进程?但他们为什么不结束呢

顺便说一句,代码中发生这种情况的部分是

pool = Pool(num_cores)
res = pool.map(pdfinfo_get_pdf_properties, files)
pool.close()
pool.join() 
res_fix={}
for x in res:
    res_fix[splitext(basename(x[1]))[0]] = x[0]
return res_fix

在pdfinfo\u get\u pdf\u属性中,这称为

output = subprocess.Popen(["pdfinfo", file_path],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE).communicate()[0].decode(errors='ignore')

Tags: inpyselfdatagetpdf进程lib