将Python生成器重写为异步

1条回答

网友

1楼 · 发布于 2024-05-23 16:48:39

因此，您希望将一个“同步外观”生成器传递给一个调用，该调用需要一个普通的惰性生成器（如islice），并继续并行地获取结果

这听起来像是asyncio.as_completed的工作：您使用普通生成器来创建任务-这些任务由asyncio机器并行运行，并且在任务完成时可以获得结果（哦！）

但是，由于update_page_quality_bulk不支持异步，它永远不会将控制权交给异步IO循环，因此它可以完成得到结果的任务。这很可能会造成阻塞

在另一个线程中调用update_page_quality_bulk可能也不起作用。我没有在这里尝试过，但是我想说，你不能只是在不同的线程中迭代doc，而不是在它（及其任务）创建的线程中迭代

因此，首先要做的事情是，“生成器表达式”语法在您希望异步计算生成器的某些术语时不起作用，正如您所发现的那样-我们对其进行重构，以便在一个协程函数中创建元组-并且我们包装任务中对这些术语的所有调用（某些asyncio函数会自动在任务中进行包装）

然后我们可以使用asyncio机制来安排所有调用，并在这些结果到达时调用update_page_quality_bulk，如上所述，无法直接传递给非异步函数：asyncio循环将永远无法获得控制权。相反，我们在主线程中不断拾取任务的结果，并在另一个线程中调用sync函数-使用队列传递获取的结果。最后，使结果可以在内部可用时使用update_page_quality_bulk，我们为threading.Queue创建了一个小的包装类，这样就可以像在迭代器中一样使用它——这对于使用迭代器的代码是透明的


# example code: untested

async def get_doc_values(doc_id):
    loop = asyncio.get_running_loop()
    # Run_in_executor runs the synchronous function in parallel in a thread-pool
    # check the docs - you might want to pass a custom executor with more than
    # the default number of workers, instead of None:
    return doc_id, await asyncio.run_in_executor(None, get_value, doc_id)


def update_es(iterator):
    # this function runs in a separate thread - 
    for success, item in update_page_quality_bulk(iterator):
            total_success += success
            if not success:
                logging.error(item)
                
sentinel = Ellipsis  # ... : python ellipsis - a nice sentinel that also worker for multiprocessing

class Iterator:
    """This allows the queue, fed in the main thread by the tasks as they are as they are completed
    to behave like an ordinary iterator, which can be consumed by "update_page_quality_bulk" in another thread
    """
    def __init__(self, source_queue):
        self.source = source_queue
        
        
    def __next__(self):
        value= self.source.get()
        if value is sentinel:
            raise StopIteration()
        return value


queue = threading.Queue()
iterator = Iterator(queue)
es_worker = threading.Thread(target=update_es, args=(iterator,))
es_worker.start()
for doc_value_task in asyncio.as_completed(get_doc_values(doc_id) for doc_id in doc_ids):
    doc_value = await doc_value_task
    queue.put(doc_value)
    
es_worker.join()

相关问题更多 >

编程相关推荐

热门问题

热门文章