使用flow-from-datafram的Keras多输入模型

问题

我尝试在keras中使用图像和文本两个输入来构建一个多输入模型。我使用flow_from_dataframe方法，向它传递一个pandas数据帧，其中包含图像名称以及每个图像和目标标签/类的相应文本（作为矢量化特征表示）。因此，数据帧如下所示：

ID path text-features label 111 'cat001.jpg' [0.0, 1.0, 0.0,...] cat 112 'dog001.jpg' [1.0, 0.0, 1.0,...] dog 113 'bunny001.jpg' [0.0, 1.0, 1.0,...] bunny ...

在使用Keras函数API构建模型之后，我将两个输入输入输入到模型中，如下所示：

datagen=ImageDataGenerator(rescale=1./255,validation_split=0.15) train_generator=datagen.flow_from_dataframe(dataframe=df, directory=data_dir, x_col=path, y_col="label", has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,subset="training") validation_generator=datagen.flow_from_dataframe(dataframe=df, directory=data_dir, x_col=path, y_col="label", has_ext=True, class_mode="categorical", target_size=(224,224), batch_size=batch_size,subset="validation")

到目前为止还不错，但是现在我还停留在如何在我的数据帧中向模型提供文本特性以及在培训期间。在

问题

如何修改flow_from_dataframe生成器，以便在训练期间处理数据帧中的文本特征数据和图像？另外，由于我在flow_from_dataframe上找不到这种修改的例子，我想知道我是否处理这个问题错了，也就是说，有没有更好的方法来实现这一点？在

更新

同时，我一直在尝试按照我在这里找到的指南（https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly）编写我自己的生成器，并根据需要调整它。这就是我想到的：

from matplotlib.image import imread class DataGenerator(keras.utils.Sequence): def __init__(self, list_IDs, labels, batch_size=32, dim=(32,32,32), n_channels=1, n_classes=10, shuffle=True): #'Initialization' self.dim = dim self.batch_size = batch_size self.labels = labels self.list_IDs = list_IDs self.n_channels = n_channels self.n_classes = n_classes self.shuffle = shuffle self.on_epoch_end() def on_epoch_end(self): #'Updates indexes after each epoch' self.indexes = np.arange(len(self.list_IDs)) if self.shuffle == True: np.random.shuffle(self.indexes) # method for producing batches of data. # takes as argument the list of IDs of the target batch def __data_generation(self, list_IDs_temp): #'Generates data containing batch_size samples' # X : (n_samples, *dim, n_channels) # Initialization X = np.empty((self.batch_size, *self.dim, self.n_channels)) Xtext = np.empty((self.batch_size, 7576)) y = np.empty((self.batch_size), dtype=int) # Generate data for i, ID in enumerate(list_IDs_temp): # Store sample X[i,] = imread('C:/Users/aaron/Desktop/training/'+str(ID)) # <--- all files are in the same DIR Xtext[i,] = np.array(total_data[df.path== str(ID)]["text-features"].values) # <--- I look-up the text-features by using the ID as a filter with the path column. This line throws the error. # Store class y[i] = self.labels[ID] return X, Xtext, keras.utils.to_categorical(y, num_classes=self.n_classes) def __len__(self): #'Denotes the number of batches per epoch' return int(np.floor(len(self.list_IDs) / self.batch_size)) # Now, when the batch corresponding to a given index is called, # the generator executes the __getitem__ method to generate it. def __getitem__(self, index): #'Generate one batch of data' # Generate indexes of the batch indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size] # Find list of IDs list_IDs_temp = [self.list_IDs[k] for k in indexes] # Generate data X,Xtext, y = self.__data_generation(list_IDs_temp) return X,Xtext, y

我初始化生成器如下：

partition = {} partition['train'] = X_train.path.values partition['validation'] = X_test.path.values from sklearn import preprocessing le = preprocessing.LabelEncoder() encoded_labels = le.fit_transform(df.label) labels = pd.Series(encoded_labels,index=df.path).to_dict() # Parameters params = {'dim': (224,224), 'batch_size': 64, 'n_classes': 5, 'n_channels': 3, 'shuffle': True} # Generators training_generator = DataGenerator(partition['train'], labels, **params) validation_generator = DataGenerator(partition['validation'], labels, **params)

但是使用这个生成器会给我带来一个错误：

ValueError: setting an array element with a sequence.

由上面代码中的X_text[i,] = np.array(total_data[total_data.bust == str(ID)].text.values)行引起。关于如何解决这个问题有什么建议吗？在

0条回答

目前没有回答

问题

问题

更新

相关问题更多 >

编程相关推荐

热门问题

热门文章