Python多进程未处理列表中的所有项目
我有一个程序用来处理文件(大约3400个,具体数量根据时间不同而变化)。不过,它似乎总是漏掉一些文件,也就是说,尽管我给它大约3400个文件,它每次处理的却只有大约3100个。下面是代码:
import multiprocessing
from multiprocessing import Pool
def split_list(L, n):
return [L[i::n] for i in xrange(n)]
def coreFunc(myarg):
listlen = len(myarg)
print "listlen = ", listlen
for listiter in range(listlen):
input1 = (myarg[listiter]).rstrip('\n')
print "input1 = ", input1
return 1
if __name__=="__main__":
fptr = open("myfilelist")
array = fptr.readlines()
numC = multiprocessing.cpu_count()
lists = split_list(array, numC)
p = Pool(numC)
p.map(coreFunc, lists)
p.close()
p.join()
“myfilelist”是一个文本文件,里面列出了大约3400个文件的名字,格式是这样的:
/home/user/file1
/home/user/file2
/home/user/file3
….
每次运行程序时,大约会漏掉300个文件。而且漏掉的文件每次都不一样,运行一次漏掉的文件和下一次又可能不同。
你知道为什么这些文件会被漏掉吗?我检查过,发现和文件本身没有关系,我试过用不同的文件集,或者重新排列“filelist”里的文件名等等,但都没有效果。而且也没有任何错误提示。
谢谢。
1 个回答
1
我做了一个可以直接运行的代码版本。这个修改过的代码还提供了特定进程的日志,这样可以帮助我们了解发生了什么。
希望这对你有帮助!
源代码
import logging, multiprocessing
from multiprocessing import Pool
def split_list(L, n):
return [L[i::n] for i in xrange(n)]
def coreFunc(mylist):
proclog = multiprocessing.get_logger()
proclog.info("listlen = %d", len(mylist))
for path in mylist:
proclog.info("input1 = %s", path)
return 1
if __name__=="__main__":
if 0:
array = [line.rstrip() for line in open("myfilelist")]
else:
import string
array = string.uppercase
mylog = multiprocessing.log_to_stderr()
mylog.setLevel(logging.INFO)
numC = multiprocessing.cpu_count()
lists = split_list(array, numC)
p = Pool(numC)
print p.map(coreFunc, lists)
p.close()
p.join()
输出结果
[INFO/PoolWorker-1] child process calling self.run()
[INFO/PoolWorker-2] child process calling self.run()
[INFO/PoolWorker-4] child process calling self.run()
[INFO/PoolWorker-1] listlen = 7
[INFO/PoolWorker-1] input1 = A
[INFO/PoolWorker-1] input1 = E
[INFO/PoolWorker-1] input1 = I
[INFO/PoolWorker-1] input1 = M
[INFO/PoolWorker-1] input1 = Q
[INFO/PoolWorker-3] child process calling self.run()
[INFO/PoolWorker-1] input1 = U
[INFO/PoolWorker-1] input1 = Y
[INFO/PoolWorker-1] listlen = 6
[INFO/PoolWorker-1] input1 = D
[INFO/PoolWorker-4] listlen = 6
[INFO/PoolWorker-1] input1 = H
[INFO/PoolWorker-4] input1 = C
[INFO/PoolWorker-4] input1 = G
[INFO/PoolWorker-1] input1 = L
[INFO/PoolWorker-3] listlen = 7
[INFO/PoolWorker-1] input1 = P
[INFO/PoolWorker-4] input1 = K
[INFO/PoolWorker-3] input1 = B
[INFO/PoolWorker-4] input1 = O
[INFO/PoolWorker-1] input1 = T
[INFO/PoolWorker-1] input1 = X
[INFO/PoolWorker-4] input1 = S
[INFO/PoolWorker-3] input1 = F
[INFO/PoolWorker-4] input1 = W
[INFO/PoolWorker-3] input1 = J
[INFO/PoolWorker-3] input1 = N
[INFO/PoolWorker-3] input1 = R
[INFO/PoolWorker-3] input1 = V
[INFO/PoolWorker-3] input1 = Z
[INFO/PoolWorker-1] process shutting down
[INFO/PoolWorker-2] process shutting down
[INFO/PoolWorker-2] process exiting with exitcode 0
[INFO/PoolWorker-1] process exiting with exitcode 0
[INFO/PoolWorker-3] process shutting down
[INFO/PoolWorker-4] process shutting down
[INFO/PoolWorker-3] process exiting with exitcode 0
[INFO/PoolWorker-4] process exiting with exitcode 0
[INFO/MainProcess] process shutting down
[1, 1, 1, 1]