我在多gpu中使用Keras,在2 gpu上使用Tensorflow后端。我在用发电机(keras.utils.序列)以批处理模式加载数据(BS=64)。因此,我使用fit_generator
类,为它提供训练和验证数据和步骤。
我注意到从第二纪元开始的一种奇怪的行为。基本上,每个时代的前3个步骤都只需8/9秒就可以完成,然后网络开始花费越来越长的时间(正如它应该做的那样)。日志如下:
Epoch 00001: val_acc improved from -inf to 0.46875, saving model to data/subs_best_model.h5
Epoch 2/32
1/29 [>.............................] - ETA: 8s - loss: 1.0664 - acc: 0.5000
2/29 [=>............................] - ETA: 8s - loss: 1.1384 - acc: 0.4531
3/29 [==>...........................] - ETA: 9s - loss: 1.0915 - acc: 0.5052
4/29 [===>..........................] - ETA: 42:03 - loss: 1.1064 - acc: 0.5117
5/29 [====>.........................] - ETA: 56:02 - loss: 1.1173 - acc: 0.4969
6/29 [=====>........................] - ETA: 1:03:13 - loss: 1.0964 - acc: 0.4974
7/29 [======>.......................] - ETA: 1:06:45 - loss: 1.0740 - acc: 0.5067
8/29 [=======>......................] - ETA: 1:08:35 - loss: 1.0592 - acc: 0.5195
9/29 [========>.....................] - ETA: 1:08:53 - loss: 1.0580 - acc: 0.5191
你知道是什么导致这种反常/奇怪的行为吗?你知道吗
编辑:
我的DataGenerator
灵感来自this implementation
我用于fit\u生成器的代码如下:
params = {'batch_size': TrainConfig.BATCH_SIZE,
'dim' : ( TrainConfig.BATCH_SIZE, 1, TrainConfig.SAMPLES),
'labels_dim': ( TrainConfig.BATCH_SIZE,),
'n_classes' : TrainConfig.OUTPUT_DIM}
training_generator = DataGenerator(train_set, **params)
validation_generator = DataGenerator(val_set, **params)
training_steps_per_epoch = int(1.*len(train_set) / batch_size)
validation_steps_per_epoch = int(1.*len(val_set) / batch_size)
history = model.fit_generator(generator=training_generator,
verbose=1,
use_multiprocessing=False,
workers=1,
steps_per_epoch=training_steps_per_epoch,
epochs=epochs,
validation_data=validation_generator,
validation_steps =validation_steps_per_epoch,
callbacks=callbacks)
目前没有回答
相关问题 更多 >
编程相关推荐