模型不训练,输入d时负损失

2024-03-29 10:08:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在做分割和我的数据集是有点小(1840图像),所以我想使用数据增强。我正在使用keras文档中提供的生成器,它生成一个元组,其中包含一批图像和相应的掩码,它们以相同的方式得到了增强。你知道吗

data_gen_args = dict(featurewise_center=True,
                     featurewise_std_normalization=True,
                     rotation_range=30,
                     width_shift_range=0.2,
                     height_shift_range=0.2,
                     zoom_range=0.2,
                     fill_mode='nearest',
                     horizontal_flip=True)

image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)

# Provide the same seed and keyword arguments to the fit and flow methods
seed = 1
image_datagen.fit(X_train, augment=True, seed=seed, rounds=2)
mask_datagen.fit(Y_train, augment=True, seed=seed, rounds=2)

image_generator = image_datagen.flow(X_train,
                                    batch_size=BATCH_SIZE,
                                    seed=seed)

mask_generator = mask_datagen.flow(Y_train,
                                   batch_size=BATCH_SIZE,
                                   seed=seed)

# combine generators into one which yields image and masks
train_generator = zip(image_generator, mask_generator)

然后我用这个发电机训练我的模型:

model.fit_generator(
    generator=train_generator,
    steps_per_epoch=m.ceil(len(X_train)/BATCH_SIZE),
    validation_data=(X_val, Y_val),
    epochs=EPOCHS,
    callbacks=callbacks,
    workers=4,
    use_multiprocessing=True,
    verbose=2)

但通过使用这个,我得到了负损失,模型不是训练:

Epoch 2/5000
 - 4s - loss: -2.5572e+00 - iou: 0.0138 - acc: 0.0000e+00 - val_loss: 11.8256 - val_iou: 0.0000e+00 - val_acc: 0.1551

我还想补充一点,如果我不使用featurewise\u center和featurewise\u std\u规范化,那么这个模型就是在训练。 但是我正在使用一个带有批处理规范化的模型,如果输入被规范化的话,它的性能会更好,所以这就是为什么我真的想使用featurewise参数。你知道吗

我希望我能很好地解释我的问题,你们中的一些人可以帮助我,因为我真的不明白。你知道吗

编辑: 我的模型是一个带有自定义Conv2D和conv2dtranspse块的U网:

def Conv2D_BN(x, filters, kernel_size, strides=(1,1), padding='same', activation='relu', kernel_initializer='glorot_normal', kernel_regularizer=None):
    x = Conv2D(filters, kernel_size=kernel_size, strides=strides, padding=padding, kernel_regularizer=kernel_regularizer)(x)
    x = BatchNormalization()(x)
    x = Activation(activation)(x)
    return x

def Conv2DTranspose_BN(x, filters, kernel_size, strides=(1,1), padding='same', activation='relu', kernel_initializer='glorot_normal', kernel_regularizer=None):
    x = Conv2DTranspose(filters, kernel_size=kernel_size, strides=strides, padding=padding, kernel_regularizer=kernel_regularizer)(x)
    x = BatchNormalization()(x)
    x = Activation(activation)(x)
    return x
def build_unet_bn(input_layer = Input((128,128,3)), start_depth=16, activation='relu', initializer='glorot_normal'):
    # 128 -> 64
    conv1 = Conv2D_BN(input_layer, start_depth * 1, (3, 3), activation=activation, kernel_initializer=initializer)
    conv1 = Conv2D_BN(conv1, start_depth * 1, (3, 3), activation=activation, kernel_initializer=initializer)
    pool1 = MaxPooling2D((2, 2))(conv1)

    # 64 -> 32
    conv2 = Conv2D_BN(pool1, start_depth * 2, (3, 3), activation=activation, kernel_initializer=initializer)
    conv2 = Conv2D_BN(conv2, start_depth * 2, (3, 3), activation=activation, kernel_initializer=initializer)
    pool2 = MaxPooling2D((2, 2))(conv2)

    # 32 -> 16
    conv3 = Conv2D_BN(pool2, start_depth * 4, (3, 3), activation=activation, kernel_initializer=initializer)
    conv3 = Conv2D_BN(conv3, start_depth * 4, (3, 3), activation=activation, kernel_initializer=initializer)
    pool3 = MaxPooling2D((2, 2))(conv3)

    # 16 -> 8
    conv4 = Conv2D_BN(pool3, start_depth * 8, (3, 3), activation=activation, kernel_initializer=initializer)
    conv4 = Conv2D_BN(conv4, start_depth * 8, (3, 3), activation=activation, kernel_initializer=initializer)
    pool4 = MaxPooling2D((2, 2))(conv4)

    # Middle
    convm = Conv2D_BN(pool4, start_depth * 16, (3, 3), activation=activation, kernel_initializer=initializer)
    convm = Conv2D_BN(convm, start_depth * 16, (3, 3), activation=activation, kernel_initializer=initializer)

    # 8 -> 16
    deconv4 = Conv2DTranspose_BN(convm, start_depth * 8, (3, 3), strides=(2, 2), activation=activation, kernel_initializer=initializer)
    uconv4 = concatenate([deconv4, conv4])
    uconv4 = Conv2D_BN(uconv4, start_depth * 8, (3, 3), activation=activation, kernel_initializer=initializer)
    uconv4 = Conv2D_BN(uconv4, start_depth * 8, (3, 3), activation=activation, kernel_initializer=initializer)

    # 16 -> 32
    deconv3 = Conv2DTranspose_BN(uconv4, start_depth * 4, (3, 3), strides=(2, 2), activation=activation, kernel_initializer=initializer)
    uconv3 = concatenate([deconv3, conv3])
    uconv3 = Conv2D_BN(uconv3, start_depth * 4, (3, 3), activation=activation, kernel_initializer=initializer)
    uconv3 = Conv2D_BN(uconv3, start_depth * 4, (3, 3), activation=activation, kernel_initializer=initializer)

    # 32 -> 64
    deconv2 = Conv2DTranspose_BN(uconv3, start_depth * 2, (3, 3), strides=(2, 2), activation=activation, kernel_initializer=initializer)
    uconv2 = concatenate([deconv2, conv2])
    uconv2 = Conv2D_BN(uconv2, start_depth * 2, (3, 3), activation=activation, kernel_initializer=initializer)
    uconv2 = Conv2D_BN(uconv2, start_depth * 2, (3, 3), activation=activation, kernel_initializer=initializer)

    # 64 -> 128
    deconv1 = Conv2DTranspose_BN(uconv2, start_depth * 1, (3, 3), strides=(2, 2), activation=activation, kernel_initializer=initializer)
    uconv1 = concatenate([deconv1, conv1])
    uconv1 = Conv2D_BN(uconv1, start_depth * 1, (3, 3), activation=activation, kernel_initializer=initializer)
    uconv1 = Conv2D_BN(uconv1, start_depth * 1, (3, 3), activation=activation, kernel_initializer=initializer)

    output_layer = Conv2D(1, (1,1), padding="same", activation="sigmoid")(uconv1)

    return output_layer

我创建了我的模型并用以下代码编译:

input_layer=Input((size,size,3))
output_layer = build_unet_bn(input_layer, 16)

model = Model(inputs=input_layer, outputs=output_layer)

model.compile(optimizer=Adam(lr=1e-3), loss='binary_crossentropy', metrics=metrics)

Tags: layertruesizetraingeneratoractivationkernelstart
1条回答
网友
1楼 · 发布于 2024-03-29 10:08:05

要理解为什么你的模型没有学习,你应该考虑两件事。 首先,由于最后一层的激活是sigmoid,所以模型总是输出范围(0,1)内的值。但是由于featurewise_centerfeaturewise_std_normalization,目标值将在范围[-1,1]内。这意味着目标变量的域与网络输出的域不同。你知道吗

其次,二元交叉熵损失是基于“目标变量在[0,1]范围内,网络输出在(0,1)范围内”的假设。建立了二元交叉熵方程

eq

因为目标变量(y)在范围[-1,1]内,所以得到的是负值。例如,如果目标(y)值为-0.5,网络输出为0.01,则损失值将为-2.2875

解决方案

解决方案1

从数据扩充中删除featurewise_centerfeaturewise_std_normalization。你知道吗

解决方案2

改变最后一层的激活和损失函数,可以更好地满足您的问题。例如tanh函数输出范围[-1,1]中的值。只要稍微改变二元交叉熵,tanh函数就可以用来训练你的模型。你知道吗

结论

在我看来,使用解决方案1更好,因为它非常简单和直接。但是如果你真的想使用“特性中心”和“特性标准化”,我认为你应该使用解决方案2。你知道吗

由于tanh函数是sigmoid函数的重标度版本,因此tanh激活的二元交叉熵会有轻微的修改(从this answer中发现)

eq2

这可以在keras中实现,如下所示

def bce_modified(y_true, y_pred):
    return (1.0/2.0) * ((1-y_true) * K.log(1-y_pred) + (1+y_true) * K.log(1+y_pred))


def build_unet_bn(input_layer = Input((128,128,3)), start_depth=16, activation='relu', initializer='glorot_normal'):
    # part of the method without the last layer
    output_layer = Conv2D(1, (1,1), padding="same", activation="tanh")(uconv1)

    return output_layer

model.compile(optimizer=Adam(lr=1e-3), loss=bce_modified, metrics=metrics)

相关问题 更多 >