PythonRay：如何在工作人员之间共享变量？

@ray.remote def f(x): # create inputs from x # do work unknown_y1 = [] obtained_y1 = [] for index, y in enumerate(y1): key = '|'.join([str(x) for x in y.values()]) if key in cached: obtained_y1.append(cached[key]) else: obtained_y1.append(np.inf) unknown_y1.append(promo) unknown_y2 = [] obtained_y2 = [] for index, y in enumerate(y2): key = '|'.join([str(x) for x in y.values()]) if key in cached: obtained_y2.append(cached[key]) else: obtained_y2.append(np.inf) unknown_y2.append(baseline) known_y1, known_y2 = predictor.predict(unknown_y1,unknown_y2) unknown_index = 0 for index in range(len(y1)): if(obtained_y1[index] == np.inf): obtained_y1[index] = known_y1[unknown_index] key = '|'.join([str(x) for x in y1[index].values()]) if not(key in cached): cached[key] = obtained_y1[index] unknown_index = unknown_index+1 unknown_index = 0 for index in range(len(y2)): if(obtained_y2[index] == np.inf): obtained_y2[index] = known_y2[unknown_index] key = '|'.join([str(x) for x in y2[index].values()]) if not(key in cached): cached[key] = obtained_y2[index] unknown_index = unknown_index+1

2条回答

网友

1楼 · 编辑于 2024-04-19 22:00:49

不幸的是，您没有创建minimal, reproducible example，因此我看不出您是如何进行多重处理的。为了便于讨论，我将假设您正在使用来自multiprocessing模块（concurrent.futures.ProcessPoolExecutor的Pool类作为类似的工具）。然后您想使用一个管理的，sharabledict，如下所示：

from multiprocessing import Pool, Manager


def init_pool(the_cache):
    # initialize each process in the pool with the following global variable:
    global cached
    cached = the_cache

def main():
    with Manager() as manager:
        cached = manager.dict()
        with Pool(initializer=init_pool, initargs=(cached,)) as pool:
            ... # code that creates tasks

# required by Windows:
if __name__ == '__main__':
    main()

这将在dictionary中使用变量cached创建对该dictionary的代理的引用。因此，所有字典访问本质上更类似于远程过程调用，因此执行速度比“正常”字典访问慢得多。只是要知道

如果有其他机制来创建worker（decorator @ray.remote？），那么cached变量可以作为参数传递给函数f。

网友

2楼 · 编辑于 2024-04-19 22:00:49

您可能对这个关于为Ray编写函数缓存的问题/答案感兴趣Implementing cache for Ray actor function

您的想法是正确的，但我认为您缺少的关键细节是，您应该使用Ray将全局状态保存在actor或对象存储中（如果是不可变的）

在您的情况下，看起来您正在尝试缓存远程功能的一部分，而不是整个功能。你可能想要这样的东西

<>这是一个简化的版本，你可以考虑如何编写你的函数。

@ray.remote
class Cache:
  def __init__(self):
    self.cache = {}

  def put(self, x, y):
    self.cache[x] = y

  def get(self, x):
    return self.cache.get(x)

global_cache = Cache.remote()

@ray.remote
def f(x):
  all_inputs = list(range(x)) # A simplified set of generated inputs based on x
  obtained_output = ray.get([global_cache.get(i) for i in all_inputs])

  unknown_indices = []
  for i, output in enumerate(obtained_output):
    if output is None:
        unknown_inputs.append(i)
 
  # Now go through and calculate all the unknown inputs
  for i in unknown_inputs:
    output = predict(all_inputs[i]) # calculate the output
    global_cache.put.remote(output) # Cache it so it's available next time
    obtained_output[i] = output

  return obtained_output

相关问题更多 >

编程相关推荐

热门问题

热门文章