Keras TensorFlow Probability模型未学习到分布扩散

Question

我建立并训练了一个Keras Tensorflow Probability模型。这个模型基本上是一个全连接的神经网络，输出层使用了DistributionLambda。下面是最后一层的代码示例：

tfp.layers.DistributionLambda(
            lambda t: tfd.Independent(tfd.Normal(loc=t[..., :n], scale=1e-5 + tf.nn.softplus(c + t[..., n:])),
                                      reinterpreted_batch_ndims=1))

在训练过程中，我使用均方误差作为损失函数。训练进展得很好，数值上也很稳定。

训练完成后，我首先去掉模型的最后一层，然后用测试集的数据进行前向预测。这基本上让我得到了模型为测试集中每个数据点学习到的“期望”loc（位置）和scale（尺度）。不过，由于在DistributionLambda中使用了softplus修正，我还需要对去掉最后一层后的模型预测的scale应用同样的修正。

我想验证模型是否根据输入值学习到了合适的分布。因此，利用这些对loc（均值）和scale（标准差）的预测，我可以创建校准图，看看模型对潜在分布的学习效果如何。均值的校准图看起来很不错。我还在为scale（标准差）参数创建校准图，代码大致如下：

def create_stdev_calibration_plot(df: pd.DataFrame,
                              y_true: str = 'y_true',
                              y_pred_mean: str = 'y_pred_mean',
                              y_pred_std: str = 'y_pred_std',
                              title: Optional[str] = None,
                              save_path: Optional[str] = None):

    # Compute the residuals
    df['residual'] = df[y_true] - df[y_pred_mean]

    # Bin data based on predicted standard deviation
    bins = np.linspace(df[y_pred_std].min(), df[y_pred_std].max(), 10)
    df['bin'] = np.digitize(df[y_pred_std], bins)

    # For each bin, compute mean predicted std and actual std of residuals
    df['y_pred_variance'] = df[y_pred_std] ** 2
    bin_means_variance = df.groupby('bin')['y_pred_variance'].mean()

    # Convert back to standard deviation
    bin_means = np.sqrt(bin_means_variance)
    bin_residual_stds = df.groupby('bin')['residual'].std()

    # Create the calibration plot
    plt.figure(figsize=(8, 8))
    plt.plot(bin_means, bin_residual_stds, 'o-')

    xrange = plt.xlim()
    yrange = plt.ylim()
    max_val = max(xrange[1], yrange[1])
    min_val = min(xrange[0], yrange[0])
    plt.axline((min_val, min_val), (max_val, max_val), linestyle='--', color='k', linewidth=2)

    plt.xlabel('Mean Predicted Standard Deviation')
    plt.ylabel('Actual Standard Deviation of Residuals')
    plt.title('Spread Calibration Plot')
    plt.grid(True)
    plt.show()

我生成了一些合成数据，以证明这个标准差的校准图能够按预期工作，代码如下：

# Number of samples
n_samples = 1000

# Input feature
x = np.random.uniform(-10, 10, size=n_samples)

# True mean and standard deviation as functions of the input feature
true_mean = 2 * x + 3
true_std = 0.5 * np.abs(x) + 1

# Generate synthetic data
y_true = np.random.normal(loc=true_mean, scale=true_std)

# Simulate model predictions (with some error)
y_pred_mean = true_mean + np.random.normal(loc=0, scale=1, size=n_samples)
y_pred_std = true_std + np.random.normal(loc=0, scale=0.5, size=n_samples)

# Ensure standard deviations are positive
y_pred_std = np.abs(y_pred_std)

df = pd.DataFrame({
    'y_true': y_true,
    'y_pred_mean': y_pred_mean,
    'y_pred_std': y_pred_std
})

create_stdev_calibration_plot(df)

这是使用合成数据生成的校准图：

当我对模型输出的数据运行同样的函数时，图表看起来是这样的：

根据校准图来看，模型似乎并没有学习到数据的分散程度，而只是学习了均值，并且将分散程度保持得很小，以此来最小化损失。我该如何调整训练，以鼓励模型准确学习分散程度呢？

更新：

我想到的一个办法是创建一个自定义损失函数，基于均值和分散的平均期望校准误差。然而，损失函数的输入是y_true张量和模型的y_pred张量。y_pred只是当前学习到的分布的采样，我无法知道分布参数（loc和scale），这使得分散校准变得不可能。此外，期望校准误差由于需要分箱而不可微分，这也使得通过反向传播进行学习变得不可能。

更新2：

我目前正在考虑将损失函数改为负对数似然（NLL）。这样我就可以得到“学习到的”分布参数，从而根据每个数据点与“学习到的”分布计算NLL损失。不过，我对这个方法并不太有信心，因为对于只有一个数据点（每行和分布组合一个数据点），NLL可能会和均方误差做同样的事情，因为当单个数据点等于分布均值时，NLL是最大的。

keras tensorflow distributionlambda mean_squared_error calibration_plot negative_log_likelihood neural_network uncertainty_estimation

Keras TensorFlow Probability模型未学习到分布扩散

更新：

更新2：

1 个回答

撰写回答