分布式张量流:在低层,工人和ps在培训过程中是如何相互作用的?

2024-04-23 23:46:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在研究分布式TensorFlow如何处理其分布式计算以复制其体系结构。我需要在底层理解工人和PS所做的操作,我不能仅仅依赖pythonapi的正确性。Here my previous question on SO。你知道吗

A PS (parameter server) keeps in memory the weights (i.e. the parameters) and receives gradients, running the update step I wrote in the code above. It does this every time it receives gradients from a worker.

A worker, on the other hand, looks up what's the current value of weights in the PS, makes a copy of it locally, runs a forward and a backward pass of the network on a batch of data and gets new gradients, which then sends back to the PS.

因此,似乎工人计算梯度,然后将梯度发送给应用它们的PS,以更新权重。但是如果我查看在Distributed TensorFlow Doc中找到的代码,我会发现在工作者代码中有一个对方法minimize()的调用

if FLAGS.job_name == "ps":
    server.join()
elif FLAGS.job_name == "worker":

    # Assigns ops to the local worker by default.
    with tf.device(tf.train.replica_device_setter(
        worker_device="/job:worker/task:%d" % FLAGS.task_index,
        cluster=cluster)):

      # Build model...
      loss = ...
      global_step = tf.contrib.framework.get_or_create_global_step()

      train_op = tf.train.AdagradOptimizer(0.01).minimize(
          loss, global_step=global_step) # < - - - - - - - - - HERE 

如果我们查看pythonapi中minimize方法的源代码,就会发现它调用compute\u gradients()和apply\u gradients()。你知道吗

def minimize(self, loss, global_step=None, var_list=None,
               gate_gradients=GATE_OP, aggregation_method=None,
               colocate_gradients_with_ops=False, name=None,
               grad_loss=None):

    grads_and_vars = self.compute_gradients(
        loss, var_list=var_list, gate_gradients=gate_gradients,
        aggregation_method=aggregation_method,
        colocate_gradients_with_ops=colocate_gradients_with_ops,
        grad_loss=grad_loss)

    vars_with_grad = [v for g, v in grads_and_vars if g is not None]
    if not vars_with_grad:
      raise ValueError(
          "No gradients provided for any variable, check your graph for ops"
          " that do not support gradients, between variables %s and loss %s." %
          ([str(v) for _, v in grads_and_vars], loss))

    return self.apply_gradients(grads_and_vars, global_step=global_step,
                                name=name)

工人似乎在进行计算和应用操作。然后工人们会把什么样的信息发送给PS?他们可能会发送已经通过应用梯度更新的权重?如果PS接收到所有的权重,它如何合并它们呢?你知道吗


Tags: andthenameinnonestepwithvars