Python中的多处理：有没有一种方法池.imap不积累记忆？

def train_network(network): (...) return score pool = Pool(processes = 4) scores = pool.imap(train_network, networks) scores = tqdm(scores, total = networks.size) for (network, score) in zip(networks, scores): network.score = score pool.close() pool.join()

2条回答

网友

1楼 · 编辑于 2024-05-14 18:13:13

我想出了一个似乎可行的解决办法。我抛弃了游泳池，建立了自己的简单排队系统。除了不增加（虽然它确实略微增加，但我认为这是我将一些字典存储为日志）之外，它甚至比上面的chunks解决方案消耗更少的内存：

imapqueue

我不知道为什么会这样。也许Pool对象只是占用了大量内存？不管怎样，这是我的代码：

def train_network(network):
    (...)
    return score

# Define queues to organise the parallelising
todo = mp.Queue(size = networks.size + 4)
done = mp.Queue(size = networks.size)

# Populate the todo queue
for idx in range(networks.size):
    todo.put(idx)

# Add -1's which will be an effective way of checking
# if all todo's are finished
for _ in range(4):
    todo.put(-1)

def worker(todo, done):
    ''' Network scoring worker. '''
    from queue import Empty
    while True:
        try:
            # Fetch the next todo
            idx = todo.get(timeout = 1)
        except Empty:
            # The queue is never empty, so the silly worker has to go
            # back and try again
            continue

        # If we have reached a -1 then stop
        if idx == -1:
            break
        else:
            # Score the network and store it in the done queue
            score = train_network(networks[idx])
            done.put((idx, score))

# Construct our four processes
processes = [mp.Process(target = worker,
    args = (todo, done)) for _ in range(4)]

# Daemonise the processes, which closes them when
# they finish, and start them
for p in processes:
    p.daemon = True
    p.start()

# Set up the iterable with all the scores, and set
# up a progress bar
idx_scores = (done.get() for _ in networks)
pbar = tqdm(idx_scores, total = networks.size)

# Compute all the scores in parallel
for (idx, score) in pbar:
    networks[idx].score = score

# Join up the processes and close the progress bar
for p in processes:
    p.join()
pbar.close()

网友

2楼 · 编辑于 2024-05-14 18:13:13

不幸的是，python中的multiprocessing模块带来了巨大的开销。数据通常不在进程之间共享，需要复制。这将从python3.8开始改变。你知道吗

https://docs.python.org/3.8/library/multiprocessing.shared_memory.html

尽管python3.8的正式发布日期是2019年10月21日，但您已经可以在github上下载它了

相关问题更多 >

编程相关推荐

热门问题

热门文章