我是tensorflow的新手,非常感谢您的回答。 我尝试使用transformer模型作为嵌入层,并将数据提供给自定义模型
from transformers import TFAutoModel
from tensorflow.keras import layers
def build_model():
transformer_model = TFAutoModel.from_pretrained(MODEL_NAME, config=config)
input_ids_in = layers.Input(shape=(MAX_LEN,), name='input_ids', dtype='int32')
input_masks_in = layers.Input(shape=(MAX_LEN,), name='attention_mask', dtype='int32')
embedding_layer = transformer_model(input_ids_in, attention_mask=input_masks_in)[0]
X = layers.Bidirectional(tf.keras.layers.LSTM(50, return_sequences=True, dropout=0.1, recurrent_dropout=0.1))(embedding_layer)
X = layers.GlobalMaxPool1D()(X)
X = layers.Dense(64, activation='relu')(X)
X = layers.Dropout(0.2)(X)
X = layers.Dense(30, activation='softmax')(X)
model = tf.keras.Model(inputs=[input_ids_in, input_masks_in], outputs = X)
for layer in model.layers[:3]:
layer.trainable = False
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
return model
model = build_model()
model.summary()
r = model.fit(
train_ds,
steps_per_epoch=train_steps,
epochs=EPOCHS,
verbose=3)
我有30个类,标签不是一个热编码的,所以我使用稀疏分类交叉熵作为我的损失函数,但我一直得到以下错误
ValueError: Shape mismatch: The shape of labels (received (1,)) should equal the shape of logits except for the last dimension (received (10, 30)).
我怎样才能解决这个问题? 为什么需要(10,30)形状?我知道30是因为最后一个密度层有30个单位,但为什么是10个呢?是因为最大长度是10吗
我的模型摘要:
Model: "model_16"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_ids (InputLayer) [(None, 10)] 0
__________________________________________________________________________________________________
attention_mask (InputLayer) [(None, 10)] 0
__________________________________________________________________________________________________
tf_bert_model_21 (TFBertModel) TFBaseModelOutputWit 162841344 input_ids[0][0]
attention_mask[0][0]
__________________________________________________________________________________________________
bidirectional_17 (Bidirectional (None, 10, 100) 327600 tf_bert_model_21[0][0]
__________________________________________________________________________________________________
global_max_pooling1d_15 (Global (None, 100) 0 bidirectional_17[0][0]
__________________________________________________________________________________________________
dense_32 (Dense) (None, 64) 6464 global_max_pooling1d_15[0][0]
__________________________________________________________________________________________________
dropout_867 (Dropout) (None, 64) 0 dense_32[0][0]
__________________________________________________________________________________________________
dense_33 (Dense) (None, 30) 1950 dropout_867[0][0]
==================================================================================================
Total params: 163,177,358
Trainable params: 336,014
Non-trainable params: 162,841,344
10是一批中的多个序列。我怀疑这是数据集中的一系列序列
您的模型充当序列分类器。所以每个序列都应该有一个标签
相关问题 更多 >
编程相关推荐