用所有重量加载Roberta模型

2024-03-28 23:13:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我通过TFRobertaModel.frompretrained('Roberta-base')加载Roberta模型,并使用Keras对其进行训练。我在Roberta上面还有其他层,我需要用所有参数初始化裸Roberta。我在Colab上运行代码,从几周前加载Roberta时起,我就收到了以下警告,但尽管没有初始化“lm_head”权重,但一切正常,模型训练正常:

Some weights of the model checkpoint at Roberta-base were not used when initializing ROBERTA: [‘lm_head’]

但现在,我认为colab上变压器的版本已经改变,因为我得到了带有相同代码的新警告,表明更多编码器和偏置层未初始化,这导致损耗函数没有减少:

Some layers from the model checkpoint at roberta-base were not used when initializing ROBERTA: ['lm_head', 'encoder/layer_._3/attention/self/value/bias:0', 'encoder/layer_._10/attention/self/value/bias:0', 'encoder/layer_._10/attention/self/key/kernel:0', 'pooler/dense/bias:0', 'encoder/layer_._9/attention/self/query/kernel:0', 'encoder/layer_._10/attention/self/query/kernel:0', 'encoder/layer_._7/attention/output/dense/bias:0', 'embeddings/position_embeddings/embeddings:0', 'encoder/layer_._6/intermediate/dense/kernel:0', 'encoder/layer_._11/intermediate/dense/kernel:0', 'encoder/layer_._8/intermediate/dense/bias:0', 'encoder/layer_._10/attention/self/value/kernel:0', 'encoder/layer_._7/output/dense/bias:0', 'encoder/layer_._6/attention/self/value/bias:0', 'encoder/layer_._8/attention/output/dense/kernel:0', 'encoder/layer_._10/intermediate/dense/kernel:0', 'encoder/layer_._5/attention/self/value/kernel:0', 'encoder/layer_._6/attention/output/LayerNorm/gamma:0', 'encoder/layer_._7/attention/self/query/kernel:0', 'encoder/layer_._6/attention/self/query/kernel:0', 'encoder/layer_._6/attention/self/key/bias:0', 'encoder/layer_._8/attention/output/LayerNorm/gamma:0', 'encoder/layer_._2/output/dense/kernel:0', 'encoder/layer_._11/intermediate/dense/bias:0', 'encoder/layer_._6/output/dense/kernel:0', 'encoder/layer_._2/intermediate/dense/kernel:0', 'encoder/layer_._3/intermediate/dense/kernel:0', 'encoder/layer_._10/output/LayerNorm/beta:0', 'encoder/layer_._6/attention/self/query/bias:0', 'encoder/layer_._6/attention/output/LayerNorm/beta:0', 'encoder/layer_._9/attention/self/value/bias:0', 'encoder/layer_._8/attention/self/query/kernel:0', 'encoder/layer_._0/output/LayerNorm/gamma:0', 'encoder/layer_._11/attention/output/dense/bias:0', 'encoder/layer_._7/attention/self/value/bias:0', 'encoder/layer_._0/attention/output/dense/kernel:0', 'encoder/layer_._9/intermediate/dense/bias:0', 'encoder/layer_._2/attention/self/query/kernel:0', 'encoder/layer_._0/attention/self/key/bias:0', 'encoder/layer_._8/attention/output/LayerNorm/beta:0', 'encoder/layer_._1/attention/self/value/kernel:0', 'encoder/layer_._6/output/LayerNorm/gamma:0', 'encoder/layer_._1/attention/output/dense/bias:0', 'encoder/layer_._3/attention/self/query/bias:0', 'encoder/layer_._3/output/dense/bias:0', 'encoder/layer_._1/attention/self/key/kernel:0', 'encoder/layer_._8/attention/self/key/kernel:0', 'encoder/layer_._9/intermediate/dense/kernel:0', 'encoder/layer_._3/output/dense/kernel:0', 'encoder/layer_._2/output/LayerNorm/beta:0', 'encoder/layer_._7/attention/self/key/bias:0', 'encoder/layer_._5/attention/self/key/kernel:0', 'encoder/layer_._5/attention/self/query/bias:0', 'encoder/layer_._2/attention/output/dense/bias:0', 'encoder/layer_._4/intermediate/dense/kernel:0', 'encoder/layer_._1/intermediate/dense/bias:0', 'encoder/layer_._4/attention/self/value/kernel:0', 'encoder/layer_._11/attention/self/key/bias:0', 'encoder/layer_._5/output/dense/kernel:0', 'encoder/layer_._1/output/dense/bias:0', 'encoder/layer_._0/attention/self/value/bias:0', 'encoder/layer_._6/attention/self/key/kernel:0', 'encoder/layer_._9/attention/self/key/bias:0', 'encoder/layer_._7/output/LayerNorm/gamma:0', 'encoder/layer_._8/attention/output/dense/bias:0', 'encoder/layer_._10/attention/output/dense/bias:0', 'encoder/layer_._0/intermediate/dense/kernel:0', 'encoder/layer_._5/intermediate/dense/kernel:0', 'encoder/layer_._11/attention/self/value/kernel:0', 'encoder/layer_._8/attention/self/key/bias:0', 'encoder/layer_._8/output/dense/bias:0', 'encoder/layer_._8/intermediate/dense/kernel:0', 'encoder/layer_._7/attention/output/LayerNorm/beta:0', 'encoder/layer_._2/output/dense/bias:0', 'encoder/layer_._3/attention/output/dense/bias:0', 'encoder/layer_._0/output/dense/bias:0', 'encoder/layer_._9/attention/self/key/kernel:0', 'encoder/layer_._11/output/dense/bias:0', 'encoder/layer_._7/attention/self/query/bias:0', 'encoder/layer_._10/attention/self/key/bias:0', 'encoder/layer_._2/attention/output/dense/kernel:0', 'encoder/layer_._2/attention/self/query/bias:0', 'encoder/layer_._9/attention/output/dense/kernel:0', 'encoder/layer_._9/attention/output/LayerNorm/gamma:0', 'encoder/layer_._9/output/LayerNorm/gamma:0', 'encoder/layer_._0/attention/output/LayerNorm/beta:0', 'encoder/layer_._1/intermediate/dense/kernel:0', 'encoder/layer_._1/output/dense/kernel:0', 'encoder/layer_._1/attention/self/key/bias:0', 'encoder/layer_._2/attention/self/value/kernel:0', 'encoder/layer_._9/attention/self/value/kernel:0', 'encoder/layer_._10/intermediate/dense/bias:0', 'encoder/layer_._4/intermediate/dense/bias:0', 'encoder/layer_._6/output/LayerNorm/beta:0', 'encoder/layer_._7/output/LayerNorm/beta:0', 'encoder/layer_._11/attention/self/query/bias:0', 'encoder/layer_._0/intermediate/dense/bias:0', 'encoder/layer_._11/attention/output/dense/kernel:0', 'encoder/layer_._5/attention/self/query/kernel:0', 'encoder/layer_._8/attention/self/value/kernel:0', 'encoder/layer_._11/output/LayerNorm/beta:0', 'encoder/layer_._9/output/dense/bias:0', 'encoder/layer_._4/output/dense/bias:0', 'encoder/layer_._2/attention/self/key/bias:0', 'encoder/layer_._3/attention/self/query/kernel:0', 'encoder/layer_._4/attention/output/LayerNorm/gamma:0', 'encoder/layer_._1/attention/output/LayerNorm/beta:0', 'encoder/layer_._1/output/LayerNorm/beta:0', 'encoder/layer_._10/attention/output/LayerNorm/beta:0', 'encoder/layer_._3/attention/self/value/kernel:0', 'encoder/layer_._10/attention/self/query/bias:0', 'encoder/layer_._3/attention/self/key/bias:0', 'pooler/dense/kernel:0', 'encoder/layer_._1/attention/self/value/bias:0', 'encoder/layer_._7/attention/self/key/kernel:0', 'encoder/layer_._1/attention/output/dense/kernel:0', 'encoder/layer_._4/attention/self/key/kernel:0', 'encoder/layer_._8/output/dense/kernel:0', 'encoder/layer_._3/attention/output/LayerNorm/gamma:0', 'encoder/layer_._0/attention/self/value/kernel:0', 'encoder/layer_._3/attention/self/key/kernel:0', 'encoder/layer_._0/attention/self/query/kernel:0', 'encoder/layer_._3/intermediate/dense/bias:0', 'encoder/layer_._7/output/dense/kernel:0', 'encoder/layer_._10/output/dense/kernel:0', 'encoder/layer_._7/intermediate/dense/bias:0', 'embeddings/word_embeddings/weight:0', 'encoder/layer_._3/attention/output/LayerNorm/beta:0', 'encoder/layer_._0/attention/self/key/kernel:0', 'encoder/layer_._4/output/dense/kernel:0', 'encoder/layer_._5/output/LayerNorm/gamma:0', 'encoder/layer_._9/attention/output/dense/bias:0', 'encoder/layer_._0/attention/output/dense/bias:0', 'encoder/layer_._5/attention/output/LayerNorm/gamma:0', 'encoder/layer_._9/attention/output/LayerNorm/beta:0', 'encoder/layer_._11/output/LayerNorm/gamma:0', 'encoder/layer_._11/attention/output/LayerNorm/gamma:0', 'encoder/layer_._6/intermediate/dense/bias:0', 'encoder/layer_._2/attention/output/LayerNorm/gamma:0', 'encoder/layer_._5/output/dense/bias:0', 'encoder/layer_._0/output/dense/kernel:0', 'encoder/layer_._6/attention/output/dense/kernel:0', 'encoder/layer_._6/attention/output/dense/bias:0', 'encoder/layer_._1/attention/self/query/kernel:0', 'encoder/layer_._0/attention/self/query/bias:0', 'encoder/layer_._11/attention/self/value/bias:0', 'encoder/layer_._2/intermediate/dense/bias:0', 'embeddings/LayerNorm/beta:0', 'encoder/layer_._4/attention/output/dense/kernel:0', 'encoder/layer_._3/output/LayerNorm/beta:0', 'encoder/layer_._8/output/LayerNorm/gamma:0', 'encoder/layer_._10/attention/output/dense/kernel:0', 'encoder/layer_._11/output/dense/kernel:0', 'encoder/layer_._2/attention/output/LayerNorm/beta:0', 'encoder/layer_._7/attention/output/dense/kernel:0', 'encoder/layer_._9/attention/self/query/bias:0', 'encoder/layer_._4/attention/self/key/bias:0', 'encoder/layer_._2/output/LayerNorm/gamma:0', 'encoder/layer_._0/attention/output/LayerNorm/gamma:0', 'encoder/layer_._1/attention/output/LayerNorm/gamma:0', 'encoder/layer_._1/attention/self/query/bias:0', 'encoder/layer_._5/attention/output/LayerNorm/beta:0', 'encoder/layer_._10/output/dense/bias:0', 'encoder/layer_._8/output/LayerNorm/beta:0', 'encoder/layer_._5/output/LayerNorm/beta:0', 'embeddings/token_type_embeddings/embeddings:0', 'encoder/layer_._5/attention/output/dense/bias:0', 'encoder/layer_._4/output/LayerNorm/beta:0', 'encoder/layer_._4/attention/self/query/kernel:0', 'encoder/layer_._5/attention/output/dense/kernel:0', 'encoder/layer_._7/attention/self/value/kernel:0', 'encoder/layer_._7/intermediate/dense/kernel:0', 'encoder/layer_._11/attention/self/key/kernel:0', 'encoder/layer_._3/output/LayerNorm/gamma:0', 'encoder/layer_._10/output/LayerNorm/gamma:0', 'encoder/layer_._8/attention/self/query/bias:0', 'encoder/layer_._3/attention/output/dense/kernel:0', 'encoder/layer_._4/output/LayerNorm/gamma:0', 'encoder/layer_._10/attention/output/LayerNorm/gamma:0', 'encoder/layer_._4/attention/self/value/bias:0', 'encoder/layer_._11/attention/self/query/kernel:0', 'encoder/layer_._4/attention/output/dense/bias:0', 'encoder/layer_._4/attention/output/LayerNorm/beta:0', 'encoder/layer_._5/attention/self/key/bias:0', 'encoder/layer_._6/attention/self/value/kernel:0', 'encoder/layer_._5/attention/self/value/bias:0', 'encoder/layer_._11/attention/output/LayerNorm/beta:0', 'encoder/layer_._1/output/LayerNorm/gamma:0', 'encoder/layer_._2/attention/self/value/bias:0', 'encoder/layer_._9/output/dense/kernel:0', 'encoder/layer_._2/attention/self/key/kernel:0', 'encoder/layer_._9/output/LayerNorm/beta:0', 'encoder/layer_._7/attention/output/LayerNorm/gamma:0', 'encoder/layer_._5/intermediate/dense/bias:0', 'embeddings/LayerNorm/gamma:0', 'encoder/layer_._0/output/LayerNorm/beta:0', 'encoder/layer_._6/output/dense/bias:0', 'encoder/layer_._8/attention/self/value/bias:0', 'encoder/layer_._4/attention/self/query/bias:0']

有人能帮我回答我的问题吗:我如何才能加载Roberta并正确初始化它的所有权重


Tags: keyselflayerencoderoutputvaluequerykernel