如何避免在Keras ImageDataGenerator的验证拆分中增加数据?

2024-06-02 08:32:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用以下生成器:

datagen = ImageDataGenerator(
    fill_mode='nearest',
    cval=0,
    rescale=1. / 255,
    rotation_range=90,
    width_shift_range=0.1,
    height_shift_range=0.1,
    zoom_range=0.5,
    horizontal_flip=True,
    vertical_flip=True,
    validation_split = 0.5,
)

train_generator = datagen.flow_from_dataframe(
    dataframe=traindf,
    directory=train_path,
    x_col="id",
    y_col=classes,
    subset="training",
    batch_size=8,
    seed=123,
    shuffle=True,
    class_mode="other",
    target_size=(64,64))


STEP_SIZE_TRAIN = train_generator.n // train_generator.batch_size

valid_generator = datagen.flow_from_dataframe(
    dataframe=traindf,
    directory=train_path,
    x_col="id",
    y_col=classes,
    subset="validation",
    batch_size=8,
    seed=123,
    shuffle=True,
    class_mode="raw",
    target_size=(64, 64))

STEP_SIZE_VALID = valid_generator.n // valid_generator.batch_size

现在的问题是验证数据也在增加,我想这不是你在训练时想要做的事情。我如何避免这种情况?我没有两个用于训练和验证的目录。我想用一个数据帧来训练网络。有什么建议吗


Tags: truedataframesizeshiftmodebatchrangetrain
3条回答

我朋友发现的解决方案是使用不同的生成器,但具有相同的验证拆分,并且没有洗牌

datagen = ImageDataGenerator(
    #featurewise_center=True,
    #featurewise_std_normalization=True,
    rescale=1. / 255,
    rotation_range=90,
    width_shift_range=0.1,
    height_shift_range=0.1,
    zoom_range=0.5,
    horizontal_flip=True,
    vertical_flip=True,
    validation_split = 0.15,
)

valid_datagen=ImageDataGenerator(rescale=1./255,validation_split=0.15)

然后您可以将这两个生成器定义为

train_generator = datagen.flow_from_dataframe(
    dataframe=traindf,
    directory=train_path,
    x_col="id",
    y_col=classes,
    subset="training",
    batch_size=64,
    seed=123,
    shuffle=False,
    class_mode="raw",
    target_size=(224,224))


STEP_SIZE_TRAIN = train_generator.n // train_generator.batch_size

valid_generator = valid_datagen.flow_from_dataframe(
    dataframe=traindf,
    directory=train_path,
    x_col="id",
    y_col=classes,
    subset="validation",
    batch_size=64,
    seed=123,
    shuffle=False,
    class_mode="raw",
    target_size=(224, 224))

STEP_SIZE_VALID = valid_generator.n // valid_generator.batch_size

您应该看到这个相关问题的答案:When using Data augmentation is it ok to validate only with the original images?

它表示在加载验证数据时使用带有空参数的ImageDataGenerator,例如:

train_gen = ImageDataGenerator(aug_params).flow_from_directory(train_dir)
valid_gen = ImageDataGenerator().flow_from_directory(valid_dir)

model.fit_generator(train_gen, validation_data=valid_gen)

您只需在代码中稍作更改即可解决此问题。可以再添加一个名为test_datagen的ImageDataGenerator对象,在该对象中只传递重缩放参数,而不传递增强技术。因此,增强技术将位于另一个对象中,您可以使用它的datagen。您还必须在将培训和测试目录传递给培训和测试数据生成器之前拆分培训和测试目录。 我给您一个来自TensorFLow的示例代码,您也可以参考this

#For traning data
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)
#For testing data
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        'data/train',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
        'data/validation',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')
model.fit_generator(
        train_generator,
        steps_per_epoch=2000,
        epochs=50,
        validation_data=validation_generator,
        validation_steps=800)

相关问题 更多 >