TensorFlow的拟合报告的损失值的原因是什么?

2024-04-29 04:31:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图了解TensorFlow在不同地方报告的损失值。这是我学到的:

  • “loss:”当前的TensorFlow(2.4.1)默认情况下在控制台is meaningless malarkey上写入(在tf nightly中修复了=2.6.0.dev20210421)。改用verbose=0verbose=2
  • 如果有多个批次,由model.fit返回的损失值将与同一数据上的model.evaluate不匹配,因为每个批次使用其权重版本进行计算
  • 即使你消除了随机性,the results are non-deterministic,但那只是分数
  • model.fitmodel.evaluate返回的损失值不是just the result of the chosen loss function,它包括来自regularizers的其他贡献

但是……哪些捐款?如何识别它们?例如,本示例将对一个复杂的、经过预训练的Bert模型进行网格化:

import os
import numpy as np
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "1"
import tensorflow as tf
from transformers import BertConfig, BertTokenizer, TFBertModel

tf.random.set_seed(42)

model_name = 'bert-base-multilingual-cased'
max_length = 2
bert_config = BertConfig.from_pretrained(model_name)
tokenizer = BertTokenizer.from_pretrained(model_name)
batch_encoding = tokenizer(text=["truth", "lies"],
                           max_length=max_length,
                           padding="max_length",
                           truncation=True,
                           return_attention_mask=False,
                           return_token_type_ids=False,
                           return_tensors="tf")
x_true = batch_encoding.data
y_true = tf.constant([[1], [2]], dtype=tf.float32)

bert_input = tf.keras.Input(name="input_ids",
                            shape=(max_length, ),
                            dtype=tf.int32)
bert_model = TFBertModel(config=bert_config)(bert_input)

model = tf.keras.Model(inputs=[bert_input], outputs=[bert_model.pooler_output])
bert = model.layers[-1]
bert.trainable = False
model.compile(loss=tf.keras.losses.MSE, optimizer=tf.keras.optimizers.SGD())
print("Trainable variables:", len(model.trainable_variables))

initial_weights_bert = bert.get_weights().copy()

initial_y_pred = model.predict(x_true, verbose=0)
y_true = np.zeros_like(initial_y_pred)


def inspect(epoch, logs):
    for i, w in enumerate(bert.get_weights()):
        assert (w == initial_weights_bert[i]).all()
    loss_logged = logs["loss"]
    loss_evaluated = model.evaluate(x_true, y_true, verbose=0)
    y_pred = model.predict(x_true, verbose=0)
    assert (y_pred == initial_y_pred).all()
    loss_computed = tf.math.reduce_mean(model.loss(y_pred, y_true)).numpy()
    print(f"\tloss logged:    {loss_logged:.4f}")
    print(f"\tloss evaluated: {loss_evaluated:.4f}")
    print(f"\tloss computed:  {loss_computed:.4f}")
    print("\tmodel.losses:", ", ".join((str(loss) for loss in model.losses)))


h = model.fit(
    x_true,
    y_true,
    epochs=5,
    shuffle=False,
    callbacks=[tf.keras.callbacks.LambdaCallback(on_epoch_end=inspect)],
    verbose=2)
print(h.history["loss"])

我们冻结Bert层,我想这并不意味着什么,因为Bert不是一个规则的Keras层,所以表面下很可能有运动的部分。无论如何,我们断言所有权重保持不变,所有预测保持不变,但损失值变化很大:

Trainable variables: 0
Epoch 1/5
1/1 - 7s - loss: 0.1582
        loss logged:    0.1582
        loss evaluated: 0.1491
        loss computed:  0.1491
        model.losses:
Epoch 2/5
1/1 - 0s - loss: 0.1546
        loss logged:    0.1546
        loss evaluated: 0.1491
        loss computed:  0.1491
        model.losses:
Epoch 3/5
1/1 - 0s - loss: 0.1534
        loss logged:    0.1534
        loss evaluated: 0.1491
        loss computed:  0.1491
        model.losses:
Epoch 4/5
1/1 - 0s - loss: 0.1532
        loss logged:    0.1532
        loss evaluated: 0.1491
        loss computed:  0.1491
        model.losses:
Epoch 5/5
1/1 - 0s - loss: 0.1557
        loss logged:    0.1557
        loss evaluated: 0.1491
        loss computed:  0.1491
        model.losses:
[0.1582154631614685, 0.15455231070518494, 0.15339043736457825, 0.15316224098205566, 0.15569636225700378]

model.losses属性没有显示任何内容,奇特的TensorBoard图(如果添加TensorBoard回调)也没有显示任何内容


Tags: trueverbosemodeltflengthmaxkerasbert