Keras中视频(4D张量)的ImageDataGenerator数据增强

2024-03-28 22:13:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我在Keras中有一个ImageDataGenerator,我想在训练期间将其应用于短视频剪辑中的每一帧,这些视频剪辑被表示为具有形状的4D numpy数组(num_frames,width,height,3)

对于由每个形状(宽度、高度、3)的图像组成的标准数据集,我通常会执行以下操作:

aug = tf.keras.preprocessing.image.ImageDataGenerator(
        rotation_range=15,
        zoom_range=0.15)

model.fit_generator(
        aug.flow(X_train, y_train),
        epochs=100)

如何将这些相同的数据增强应用于表示图像序列的4D numpy数组数据集


Tags: 数据图像numpyframes宽度rangetrain数组
1条回答
网友
1楼 · 发布于 2024-03-28 22:13:39

我想出来了。我创建了一个自定义类,该类继承自tensorflow.keras.utils.Sequence,它使用scipy为每个图像执行增强

       class CustomDataset(tf.keras.utils.Sequence):
            def __init__(self, batch_size, *args, **kwargs):
                self.batch_size = batch_size
                self.X_train = args[0]
                self.Y_train = args[1]

            def __len__(self):
                # returns the number of batches
                return int(self.X_train.shape[0] / self.batch_size)

            def __getitem__(self, index):
                # returns one batch
                X = []
                y = []
                for i in range(self.batch_size):
                    r = random.randint(0, self.X_train.shape[0] - 1)
                    next_x = self.X_train[r]
                    next_y = self.Y_train[r]
                    
                    augmented_next_x = []
                    
                    ###
                    ### Augmentation parameters for this clip.
                    ###
                    rotation_amt = random.randint(-45, 45)
                    
                    for j in range(self.X_train.shape[1]):
                        transformed_img = ndimage.rotate(next_x[j], rotation_amt, reshape=False)
                        transformed_img[transformed_img == 0] = 255
                        augmented_next_x.append(transformed_img)
                
                    X.append(augmented_next_x)
                    y.append(next_y)
                    
                X = np.array(X).astype('uint8')
                y = np.array(y)

                encoder = LabelBinarizer()
                y = encoder.fit_transform(y)
                
                return X, y

            def on_epoch_end(self):
                # option method to run some logic at the end of each epoch: e.g. reshuffling
                pass

然后将其传递给fit_generator方法:

training_data_augmentation = CustomDataset(BS, X_train_L, y_train_L)
model.fit_generator(
    training_data_augmentation, 
    epochs=300)

相关问题 更多 >