cat图像Keras自动编码器的微调

2024-04-19 12:45:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我想在现实生活中使用自动编码器(而不是简单的数字)。我拿了cats and dog dataset和 用它训练。我的参数是:

  1. 我坚持使用灰度和128x128px图像的缩小版本,并在ImageDataGenerator中进行一些预处理以进行数据增强。在
  2. 我用大约2000张图片或猫和狗的不同数据集进行训练。我可以拿一万,但时间太长了。在
  3. 我选择了一个带有基本的下采样和上采样的卷积网络,并玩弄了参数,最后得到了8x8x8=512(这是原始图像128x128px的1/32)。在

下面是python代码:

from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras import metrics
from keras.callbacks import EarlyStopping
import os

root_dir = '/opt/data/pets'
epochs = 400 # epochs of training, the more the better
batch_size = 64 # number of images to be yielded from the generator per batch
seed = 4321 # constant seed for constant conditions
# keras image input type definition
img_channel = 1 # 1 for grayscale, 3 for color
 # dimension of input image for network, the bigger the more CPU and RAM is used
img_x, img_y = 128, 128
input_img = Input(shape = (img_x, img_y, img_channel))

# this is the augmentation configuration we use for training
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

# this is the augmentation configuration we will use for testing
test_datagen = ImageDataGenerator(rescale=1./255)

# this is a generator that will read pictures found in
# subfolders of 'data/train', and indefinitely generate
# batches of augmented image data
train_generator = train_datagen.flow_from_directory(
        root_dir + '/train',  # this is the target directory
        target_size=(img_x, img_y), # all images will be resized
        batch_size=batch_size,
        color_mode='grayscale',
        class_mode='input', # necessarry for autoencoder
        shuffle=False, # important for correct filename for labels
        seed = seed)

# this is a similar generator, for validation data
validation_generator = test_datagen.flow_from_directory(
        root_dir + '/validation',
        target_size=(img_x, img_y),
        batch_size=batch_size,
        color_mode='grayscale',
        class_mode='input',  # necessarry for autoencoder
        shuffle=False,  # important for correct filename for labels
        seed = seed)

# create convolutional autoencoder inspired from https://blog.keras.io/building-autoencoders-in-keras.html
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu',padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(img_channel, (3, 3), activation='sigmoid', padding='same')(x) # example from documentaton

autoencoder = Model(input_img, decoded)
autoencoder.summary() # show model data

autoencoder.compile(optimizer='sgd',loss='mean_squared_error',metrics=[metrics.mae, metrics.categorical_accuracy])

# do not run forever but stop if model does not get better
stopper = EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=2, mode='auto', verbose=1)

# do the actual fitting
autoencoder_train = autoencoder.fit_generator(
        train_generator,
        validation_data=validation_generator,
        epochs=epochs,
        shuffle=False,
        callbacks=[stopper])

# create an encoder for debugging purposes later
encoder = Model(input_img, encoded)

# save the modell paramers to a file
autoencoder.save(os.path.basename(__file__) + '_model.hdf')

## PLOTS ####################################
import matplotlib.pyplot as plt
# Plot loss over epochs    
print(autoencoder_train.history.keys())
plt.plot(autoencoder_train.history['loss'])
plt.plot(autoencoder_train.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'])
plt.show()


# Plot original, encoded and predicted image
import numpy as np
images_show_start = 1
images_show_stop = 20
images_show_number = images_show_stop - images_show_start +1

images,_ = train_generator.next()
plt.figure(figsize=(30, 5))
for i in range(images_show_start, images_show_stop):
    # original image
    ax = plt.subplot(3, images_show_number, i +1)
    image = images[i,:,:,0]
    image_reshaped = np.reshape(image, [1, 128, 128, 1])
    plt.imshow(image,cmap='gray')

    # label
    image_label = os.path.dirname(validation_generator.filenames[i])
    plt.title(image_label) # only OK if shuffle=false

    # encoded image
    ax = plt.subplot(3, images_show_number, i + 1+1*images_show_number)
    image_encoded = encoder.predict(image_reshaped)
     # adjust shape if the network parameters are adjusted
    image_encoded_reshaped = np.reshape(image_encoded, [16,32])
    plt.imshow(image_encoded_reshaped,cmap='gray')

    # predicted image
    ax = plt.subplot(3, images_show_number, i + 1+ 2*images_show_number)
    image_pred = autoencoder.predict(image_reshaped)
    image_pred_reshaped = np.reshape(image_pred, [128,128])
    plt.imshow(image_pred_reshaped,cmap='gray')
plt.show()

在网络配置中,您可以看到层。 你怎么认为?是深的还是简单的?我们能做些什么调整呢?在

network configuration

损失在各个时期都减少了,这是应该的。在

loss plot

这里我们每列有三张图片:

  1. 原始(缩小)图像
  2. 编码图像和
  3. 预言者。在

original, encoded and predicted image

所以,我想知道,为什么编码的图像在特征上看起来非常相似(除了它们都是猫),有很多垂直线。编码的图像非常大,8x8x8像素,我用16x32像素绘制,这使得它是原始图像像素的1/32。 解码图像的质量是否足够? 它能改进吗?我能在自动编码器上制造一个更小的瓶颈吗?如果我尝试一个较小的瓶颈,损失将停留在0.06,并且预测的图像非常糟糕。在


Tags: thefromimageimgforshowtrainplt
1条回答
网友
1楼 · 发布于 2024-04-19 12:45:21

您的模型只包含很少的参数(~32000)。这些可能不足以处理数据,也不足以洞察生成概率分布的数据。 卷积总是将图像大小减小2倍,但不会增加过滤器的数量。这意味着,你的卷积不是保体积的,而是强收缩的。这可能是太强了。 首先,我会尝试增加参数的数量,并检查这是否有助于使图像不那么模糊。然后,如果图像通过增加参数的数量而变得更好(应该,因为压缩级别现在比以前低了),您可以再次减少参数的数量(即压缩状态的大小)。这种方法可以帮助您发现代码中的其他问题。在

也许您可以看看keras中现有的自动编码器实现,它们在不同的数据集中工作(也具有更复杂的数据),比如使用CIFAR10的this one。在

编码状态图像中的黑线可能只是来自于如何绘制数据的方式。由于此层中的数据深度不是1而是8,因此必须调整其大小。如果原始立方体的边界值较低(这是有意义的,因为很可能没有那么多重要的信息),您将重新排列立方体的暗/黑色表面并将其投影到二维曲面上;然后这可能看起来像是重复的黑线。在

此外,考虑到网络的损失图,也可能是训练尚未收敛的情况。所以,如果你继续训练,图像的质量还是会提高的。在

最后,您应该使用所有可用的训练图像,而不仅仅是一小部分。这将(当然)增加训练所需的时间,但编码器的结果将更好,因为网络将更能抵抗过度拟合,并且很可能能够更好地推广。在

重新整理数据也可能提高培训的绩效。在

相关问题 更多 >