凯拉斯在训练中出现异常

2024-05-29 02:24:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我在多gpu中使用Keras,在2 gpu上使用Tensorflow后端。我在用发电机(keras.utils.序列)以批处理模式加载数据(BS=64)。因此,我使用fit_generator类,为它提供训练和验证数据和步骤。 我注意到从第二纪元开始的一种奇怪的行为。基本上,每个时代的前3个步骤都只需8/9秒就可以完成,然后网络开始花费越来越长的时间(正如它应该做的那样)。日志如下:

Epoch 00001: val_acc improved from -inf to 0.46875, saving model to data/subs_best_model.h5
Epoch 2/32
 1/29 [>.............................] - ETA: 8s - loss: 1.0664 - acc: 0.5000
 2/29 [=>............................] - ETA: 8s - loss: 1.1384 - acc: 0.4531
 3/29 [==>...........................] - ETA: 9s - loss: 1.0915 - acc: 0.5052
 4/29 [===>..........................] - ETA: 42:03 - loss: 1.1064 - acc: 0.5117
 5/29 [====>.........................] - ETA: 56:02 - loss: 1.1173 - acc: 0.4969
 6/29 [=====>........................] - ETA: 1:03:13 - loss: 1.0964 - acc: 0.4974
 7/29 [======>.......................] - ETA: 1:06:45 - loss: 1.0740 - acc: 0.5067
 8/29 [=======>......................] - ETA: 1:08:35 - loss: 1.0592 - acc: 0.5195
 9/29 [========>.....................] - ETA: 1:08:53 - loss: 1.0580 - acc: 0.5191

你知道是什么导致这种反常/奇怪的行为吗?你知道吗

编辑:

我的DataGenerator灵感来自this implementation

我用于fit\u生成器的代码如下:

params = {'batch_size':  TrainConfig.BATCH_SIZE,
              'dim' : ( TrainConfig.BATCH_SIZE, 1, TrainConfig.SAMPLES),
              'labels_dim': ( TrainConfig.BATCH_SIZE,),
              'n_classes' : TrainConfig.OUTPUT_DIM}

training_generator = DataGenerator(train_set, **params)
validation_generator = DataGenerator(val_set, **params)

training_steps_per_epoch = int(1.*len(train_set) / batch_size)
validation_steps_per_epoch = int(1.*len(val_set) / batch_size)

history = model.fit_generator(generator=training_generator,
                              verbose=1,
                              use_multiprocessing=False,
                              workers=1,
                              steps_per_epoch=training_steps_per_epoch,
                              epochs=epochs,
                              validation_data=validation_generator,
                              validation_steps =validation_steps_per_epoch,
                              callbacks=callbacks)

Tags: modeltrainingvalstepsgeneratorfitvalidationeta

热门问题