Python多进程未处理列表中的所有项目

0 投票
1 回答
652 浏览
提问于 2025-04-18 14:22

我有一个程序用来处理文件(大约3400个,具体数量根据时间不同而变化)。不过,它似乎总是漏掉一些文件,也就是说,尽管我给它大约3400个文件,它每次处理的却只有大约3100个。下面是代码:

import multiprocessing
from multiprocessing import Pool

def split_list(L, n):
    return [L[i::n] for i in xrange(n)]

def coreFunc(myarg):

    listlen = len(myarg)
    print "listlen = ", listlen

    for listiter in range(listlen):
        input1 = (myarg[listiter]).rstrip('\n')
        print "input1 = ", input1

    return 1

if __name__=="__main__":

    fptr = open("myfilelist")
    array = fptr.readlines()

    numC = multiprocessing.cpu_count()
    lists = split_list(array, numC)
    p = Pool(numC)
    p.map(coreFunc, lists)

    p.close()
    p.join()

“myfilelist”是一个文本文件,里面列出了大约3400个文件的名字,格式是这样的:

    /home/user/file1
    /home/user/file2
    /home/user/file3
    ….

每次运行程序时,大约会漏掉300个文件。而且漏掉的文件每次都不一样,运行一次漏掉的文件和下一次又可能不同。

你知道为什么这些文件会被漏掉吗?我检查过,发现和文件本身没有关系,我试过用不同的文件集,或者重新排列“filelist”里的文件名等等,但都没有效果。而且也没有任何错误提示。

谢谢。

1 个回答

1

我做了一个可以直接运行的代码版本。这个修改过的代码还提供了特定进程的日志,这样可以帮助我们了解发生了什么。

希望这对你有帮助!

源代码

import logging, multiprocessing
from multiprocessing import Pool

def split_list(L, n):
    return [L[i::n] for i in xrange(n)]

def coreFunc(mylist):
    proclog = multiprocessing.get_logger()

    proclog.info("listlen = %d", len(mylist))
    for path in mylist:
        proclog.info("input1 = %s", path)

    return 1


if __name__=="__main__":

    if 0:
        array = [line.rstrip() for line in open("myfilelist")]
    else:
        import string
        array = string.uppercase

    mylog = multiprocessing.log_to_stderr()
    mylog.setLevel(logging.INFO)

    numC = multiprocessing.cpu_count()
    lists = split_list(array, numC)

    p = Pool(numC)
    print p.map(coreFunc, lists)
    p.close()
    p.join()

输出结果

[INFO/PoolWorker-1] child process calling self.run()
[INFO/PoolWorker-2] child process calling self.run()
[INFO/PoolWorker-4] child process calling self.run()
[INFO/PoolWorker-1] listlen = 7
[INFO/PoolWorker-1] input1 = A
[INFO/PoolWorker-1] input1 = E
[INFO/PoolWorker-1] input1 = I
[INFO/PoolWorker-1] input1 = M
[INFO/PoolWorker-1] input1 = Q
[INFO/PoolWorker-3] child process calling self.run()
[INFO/PoolWorker-1] input1 = U
[INFO/PoolWorker-1] input1 = Y
[INFO/PoolWorker-1] listlen = 6
[INFO/PoolWorker-1] input1 = D
[INFO/PoolWorker-4] listlen = 6
[INFO/PoolWorker-1] input1 = H
[INFO/PoolWorker-4] input1 = C
[INFO/PoolWorker-4] input1 = G
[INFO/PoolWorker-1] input1 = L
[INFO/PoolWorker-3] listlen = 7
[INFO/PoolWorker-1] input1 = P
[INFO/PoolWorker-4] input1 = K
[INFO/PoolWorker-3] input1 = B
[INFO/PoolWorker-4] input1 = O
[INFO/PoolWorker-1] input1 = T
[INFO/PoolWorker-1] input1 = X
[INFO/PoolWorker-4] input1 = S
[INFO/PoolWorker-3] input1 = F
[INFO/PoolWorker-4] input1 = W
[INFO/PoolWorker-3] input1 = J
[INFO/PoolWorker-3] input1 = N
[INFO/PoolWorker-3] input1 = R
[INFO/PoolWorker-3] input1 = V
[INFO/PoolWorker-3] input1 = Z
[INFO/PoolWorker-1] process shutting down
[INFO/PoolWorker-2] process shutting down
[INFO/PoolWorker-2] process exiting with exitcode 0
[INFO/PoolWorker-1] process exiting with exitcode 0
[INFO/PoolWorker-3] process shutting down
[INFO/PoolWorker-4] process shutting down
[INFO/PoolWorker-3] process exiting with exitcode 0
[INFO/PoolWorker-4] process exiting with exitcode 0
[INFO/MainProcess] process shutting down
[1, 1, 1, 1]

撰写回答