如何在tensorflow 2.0中累积渐变?

2024-06-17 13:44:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在用tensorflow 2.0训练一个模型。我的训练集中的图像具有不同的分辨率。我建立的模型可以处理可变分辨率(conv层,然后是全局平均)。我的训练集非常小,我想在一个批次中使用完整的训练集

因为我的图像分辨率不同,所以我不能使用model.fit()。因此,我计划将每个样本分别通过网络,累积误差/梯度,然后应用一个优化器步骤。我能够计算损失值,但我不知道如何累积损失/梯度。如何累积损失/梯度,然后应用单个优化器步骤

代码

for i in range(num_epochs):
    print(f'Epoch: {i + 1}')
    total_loss = 0
    for j in tqdm(range(num_samples)):
        sample = samples[j]
        with tf.GradientTape as tape:
            prediction = self.model(sample)
            loss_value = self.loss_function(y_true=labels[j], y_pred=prediction)
        gradients = tape.gradient(loss_value, self.model.trainable_variables)
        self.optimizer.apply_gradients(zip(gradients, self.model.trainable_variables))
        total_loss += loss_value

    epoch_loss = total_loss / num_samples
    print(f'Epoch loss: {epoch_loss}')

Tags: 模型图像selfformodelvalue分辨率步骤
2条回答

如果我从本声明中正确理解:

How can I accumulate the losses/gradients and then apply a single optimizer step?

@Nagabhushan正在尝试累积梯度,然后在(平均)累积梯度上应用优化。@TensorflowSupport提供的答案无法回答此问题。 为了只执行一次优化,并累积多个磁带的梯度,可以执行以下操作:

for i in range(num_epochs):
    print(f'Epoch: {i + 1}')
    total_loss = 0

    # get trainable variables
    train_vars = self.model.trainable_variables
    # Create empty gradient list (not a tf.Variable list)
    accum_gradient = [tf.zeros_like(this_var) for this_var in train_vars]

    for j in tqdm(range(num_samples)):
        sample = samples[j]
        with tf.GradientTape as tape:
            prediction = self.model(sample)
            loss_value = self.loss_function(y_true=labels[j], y_pred=prediction)
        total_loss += loss_value

        # get gradients of this tape
        gradients = tape.gradient(loss_value, train_vars)
        # Accumulate the gradients
        accum_gradient = [(acum_grad+grad) for acum_grad, grad in zip(accum_gradient, gradients)]


    # Now, after executing all the tapes you needed, we apply the optimization step
    # (but first we take the average of the gradients)
    accum_gradient = [this_grad/num_samples for this_grad in accum_gradient]
    # apply optimization step
    self.optimizer.apply_gradients(zip(accum_gradient,train_vars))
        

    epoch_loss = total_loss / num_samples
    print(f'Epoch loss: {epoch_loss}')

应该避免在训练循环中使用tf.Variable(),因为它在尝试以图形形式执行代码时会产生错误。如果在训练函数中使用tf.Variable(),然后用“@tf.function”或应用“tf.function(my_train_fcn)”来获得图形函数(即为了提高性能),则执行将产生错误。 之所以会出现这种情况,是因为tf.Variable函数的跟踪导致的行为与在急切执行中观察到的不同(分别是重新利用或创建)。您可以在tensorflow help page中找到有关这方面的更多信息

根据Stack Overflow AnswerTensorflow Website中提供的解释,下面提到的是Tensorflow 2.0版中累积梯度的代码:

def train(epochs):
  for epoch in range(epochs):
    for (batch, (images, labels)) in enumerate(dataset):
       with tf.GradientTape() as tape:
        logits = mnist_model(images, training=True)
        tvs = mnist_model.trainable_variables
        accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
        zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
        loss_value = loss_object(labels, logits)

       loss_history.append(loss_value.numpy().mean())
       grads = tape.gradient(loss_value, tvs)
       #print(grads[0].shape)
       #print(accum_vars[0].shape)
       accum_ops = [accum_vars[i].assign_add(grad) for i, grad in enumerate(grads)]



    optimizer.apply_gradients(zip(grads, mnist_model.trainable_variables))
    print ('Epoch {} finished'.format(epoch))

# Call the above function    
train(epochs = 3)

完整的代码可以在这个Github Gist中找到

相关问题 更多 >