ValueError:形状不匹配:标签的形状(已接收(1,))应与Logit的形状相等,但最后一个尺寸除外(已接收(10,30))

2024-06-08 05:22:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我是tensorflow的新手,非常感谢您的回答。 我尝试使用transformer模型作为嵌入层,并将数据提供给自定义模型

from transformers import TFAutoModel
from tensorflow.keras import layers
def build_model():
    transformer_model = TFAutoModel.from_pretrained(MODEL_NAME, config=config)
    
    input_ids_in = layers.Input(shape=(MAX_LEN,), name='input_ids', dtype='int32')
    input_masks_in = layers.Input(shape=(MAX_LEN,), name='attention_mask', dtype='int32')

    embedding_layer = transformer_model(input_ids_in, attention_mask=input_masks_in)[0]

    X = layers.Bidirectional(tf.keras.layers.LSTM(50, return_sequences=True, dropout=0.1, recurrent_dropout=0.1))(embedding_layer)
    X = layers.GlobalMaxPool1D()(X)
    X = layers.Dense(64, activation='relu')(X)
    X = layers.Dropout(0.2)(X)
    X = layers.Dense(30, activation='softmax')(X)

    model = tf.keras.Model(inputs=[input_ids_in, input_masks_in], outputs = X)

    for layer in model.layers[:3]:
        layer.trainable = False

    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

    
model = build_model()
model.summary()
r = model.fit(
            train_ds,
            steps_per_epoch=train_steps,
            epochs=EPOCHS,
            verbose=3)

我有30个类,标签不是一个热编码的,所以我使用稀疏分类交叉熵作为我的损失函数,但我一直得到以下错误

ValueError: Shape mismatch: The shape of labels (received (1,)) should equal the shape of logits except for the last dimension (received (10, 30)).

我怎样才能解决这个问题? 为什么需要(10,30)形状?我知道30是因为最后一个密度层有30个单位,但为什么是10个呢?是因为最大长度是10吗

我的模型摘要:

Model: "model_16"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_ids (InputLayer)          [(None, 10)]         0                                            
__________________________________________________________________________________________________
attention_mask (InputLayer)     [(None, 10)]         0                                            
__________________________________________________________________________________________________
tf_bert_model_21 (TFBertModel)  TFBaseModelOutputWit 162841344   input_ids[0][0]                  
                                                                 attention_mask[0][0]             
__________________________________________________________________________________________________
bidirectional_17 (Bidirectional (None, 10, 100)      327600      tf_bert_model_21[0][0]           
__________________________________________________________________________________________________
global_max_pooling1d_15 (Global (None, 100)          0           bidirectional_17[0][0]           
__________________________________________________________________________________________________
dense_32 (Dense)                (None, 64)           6464        global_max_pooling1d_15[0][0]    
__________________________________________________________________________________________________
dropout_867 (Dropout)           (None, 64)           0           dense_32[0][0]                   
__________________________________________________________________________________________________
dense_33 (Dense)                (None, 30)           1950        dropout_867[0][0]                
==================================================================================================
Total params: 163,177,358
Trainable params: 336,014
Non-trainable params: 162,841,344

Tags: in模型nonelayeridsinputmodellayers
1条回答
网友
1楼 · 发布于 2024-06-08 05:22:51

10是一批中的多个序列。我怀疑这是数据集中的一系列序列

您的模型充当序列分类器。所以每个序列都应该有一个标签

相关问题 更多 >

    热门问题