我正在训练一个关于SemEval 2017 task 4A dataset(3类分类问题)的LSTM模型。我观察到,第一次验证损失减少,但随后突然显著增加,然后再次减少。它表现出正弦特性,可以从以下训练时期观察到
这是我的模型的代码
model = Sequential()
model.add(Embedding(max_words, 30, input_length=max_len))
model.add(Activation('tanh'))
model.add(Dropout(0.3))
model.add(Bidirectional(LSTM(32)))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(3, activation='sigmoid'))
model.summary()
下面是模型摘要
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_2 (Embedding) (None, 300, 30) 60000
_________________________________________________________________
batch_normalization_3 (Batch (None, 300, 30) 120
_________________________________________________________________
activation_3 (Activation) (None, 300, 30) 0
_________________________________________________________________
dropout_3 (Dropout) (None, 300, 30) 0
_________________________________________________________________
bidirectional_2 (Bidirection (None, 64) 16128
_________________________________________________________________
batch_normalization_4 (Batch (None, 64) 256
_________________________________________________________________
activation_4 (Activation) (None, 64) 0
_________________________________________________________________
dropout_4 (Dropout) (None, 64) 0
_________________________________________________________________
dense_2 (Dense) (None, 1) 65
=================================================================
Total params: 76,569
Trainable params: 76,381
Non-trainable params: 188
我使用手套进行单词嵌入,Adam优化器,分类交叉熵损失函数
在改变损失函数和密集层后,这里是训练阶段
Train on 16711 samples, validate on 1857 samples
Epoch 1/5
16711/16711 [==============================] - 55s 3ms/step - loss: 0.5976 - accuracy: 0.7456 - val_loss: 0.9060 - val_accuracy: 0.6182
Epoch 2/5
16711/16711 [==============================] - 54s 3ms/step - loss: 0.5872 - accuracy: 0.7521 - val_loss: 0.8919 - val_accuracy: 0.6144
Epoch 3/5
16711/16711 [==============================] - 54s 3ms/step - loss: 0.5839 - accuracy: 0.7518 - val_loss: 0.9067 - val_accuracy: 0.6187
Epoch 4/5
16711/16711 [==============================] - 54s 3ms/step - loss: 0.5766 - accuracy: 0.7554 - val_loss: 0.9437 - val_accuracy: 0.6268
Epoch 5/5
16711/16711 [==============================] - 54s 3ms/step - loss: 0.5742 - accuracy: 0.7544 - val_loss: 0.9272 - val_accuracy: 0.6166
测试阶段
accr = model.evaluate(test_sequences_matrix, Y_test)
2064/2064 [==============================] - 2s 1ms/step
print('Test set\n Loss: {:0.3f}\n Accuracy: {:0.3f}'.format(accr[0],accr[1]))
Test set
Loss: 0.863
Accuracy: 0.649
混淆矩阵
Confusion Matrix :
[[517 357 165]
[379 246 108]
[161 88 43]]
Accuracy Score : 0.3905038759689923
分类报告
precision recall f1-score support
0 0.49 0.50 0.49 1039
1 0.36 0.34 0.35 733
2 0.14 0.15 0.14 292
accuracy 0.39 2064
macro avg 0.33 0.33 0.33 2064
weighted avg 0.39 0.39 0.39 2064
混淆矩阵代码(我已从sklearn.metrics导入。导入混淆矩阵、准确性评分、分类报告)
results = confusion_matrix(doc_test.response, Y_test)
print('Confusion Matrix :')
print(results)
print('Accuracy Score :',accuracy_score(doc_test.response, Y_test))
当有两个以上的类时,不能使用二进制交叉熵。将损失函数更改为分类交叉熵,并将输出层设置为有三个神经元(每个类一个)
不管怎样,从你的训练曲线上,我可以看出网络是过度拟合的。这可能是因为您的数据或您的网络。查看此post了解有关深度学习模型中过度拟合的更多信息
这是您的模型的学习曲线图。它表现出典型的过度拟合行为。
相关问题 更多 >
编程相关推荐