Python 2.6: 使用多进程时的进程本地存储

2024-04-29 12:06:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图构建一个python脚本,它有一个工作进程池(使用多处理。池)在一大组数据中。在

我希望每个进程都有一个唯一的对象,该对象可以在该进程的多次执行中使用。在

Psudo代码:

def work(data):
    #connection should be unique per process
    connection.put(data)
    print 'work done with connection:', connection

if __name__ == '__main__':
    pPool = Pool() # pool of 4 processes
    datas = [1..1000]
    for process in pPool:
        #this is the part i'm asking about // how do I really do this?
        process.connection = Connection(conargs)
    for data in datas:
       pPool.apply_async(work, (data))

Tags: 数据对象in脚本fordata进程connection
1条回答
网友
1楼 · 发布于 2024-04-29 12:06:44

进程本地存储作为一个映射容器很容易实现,对于其他从Google这里寻找类似内容的人来说(注意这是Py3,但是很容易转换为2的语法(只需从object继承):

class ProcessLocal:
    """
    Provides a basic per-process mapping container that wipes itself if the current PID changed since the last get/set.
    Aka `threading.local()`, but for processes instead of threads.
    """

    __pid__ = -1

    def __init__(self, mapping_factory=dict):
        self.__mapping_factory = mapping_factory

    def __handle_pid(self):
        new_pid = os.getpid()
        if self.__pid__ != new_pid:
            self.__pid__, self.__store = new_pid, self.__mapping_factory()

    def __delitem__(self, key):
        self.__handle_pid()
        return self.__store.__delitem__(key)

    def __getitem__(self, key):
        self.__handle_pid()
        return self.__store.__getitem__(key)

    def __setitem__(self, key, val):
        self.__handle_pid()
        return self.__store.__setitem__(key)

查看更多@https://github.com/akatrevorjay/pytutils/blob/develop/pytutils/mappings.py

网友
2楼 · 发布于 2024-04-29 12:06:44

直接创建mp.Processes可能是最简单的方法(没有mp.Pool):

import multiprocessing as mp
import time

class Connection(object):
    def __init__(self,name):
        self.name=name
    def __str__(self):
        return self.name

def work(inqueue,conn):
    name=mp.current_process().name
    while 1:
        data=inqueue.get()
        time.sleep(.5)
        print('{n}: work done with connection {c} on data {d}'.format(
            n=name,c=conn,d=data))
        inqueue.task_done()

if __name__ == '__main__':
    N=4
    procs=[]
    inqueue=mp.JoinableQueue()
    for i in range(N):
        conn=Connection(name='Conn-'+str(i))
        proc=mp.Process(target=work,name='Proc-'+str(i),args=(inqueue,conn))
        proc.daemon=True
        proc.start()

    datas = range(1,11)
    for data in datas:
        inqueue.put(data)
    inqueue.join()

收益率

^{pr2}$

请注意,Proc号每次都对应相同的Conn号。在

网友
3楼 · 发布于 2024-04-29 12:06:44

我觉得这样的东西应该行得通(没有经过测试)

def init(*args):
    global connection
    connection = Connection(*args)
pPool = Pool(initializer=init, initargs=conargs) 

相关问题 更多 >