如何从BertForSequenceClassification加载一个经过微调的模型,并使用它来标记一个句子?

2024-03-29 14:48:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我遵循本教程(https://mccormickml.com/2019/07/22/BERT-fine-tuning/#a1-saving--loading-fine-tuned-model)对BertForSequenceClassification进行微调。训练完模型后,我想加载这个模型来编写一个函数“classify_Session(句子)”:它接受一个句子并返回预测的logit向量

def classify_sentence(self, sentence):


    self.model = BertForSequenceClassification.from_pretrained(output_dir)
    self.tokenizer = BertTokenizer.from_pretrained(output_dir)

    encoded_dict = self.tokenizer.encode_plus(
                sentence,                      # Sentence to encode.
                add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                max_length = 64,           # Pad & truncate all sentences.
                pad_to_max_length = True,
                return_attention_mask = True,   # Construct attn. masks.
                return_tensors = 'pt',     # Return pytorch tensors.
    )

    # Add the encoded sentence to the list.    
    input_id = encoded_dict['input_ids']
    # And its attention mask (simply differentiates padding from non-padding).
    attention_mask = encoded_dict['attention_mask']
    
    input_id = torch.cat(input_id, dim=0)
    attention_mask = torch.cat(attention_mask, dim=0)

    with torch.no_grad():

        output = self.model(input_id, 
        token_type_ids=None, 
        attention_mask=attention_mask
        )

    logits = outputs[0]

    return logits

output_dir是一个包含以下文件的目录:config.json、pytorch_model.bin、special_tokens_map.json、tokenizer_config.json和vocab.txt

运行此函数时,我得到一个错误:

AttributeError:“BertTokenizer”对象没有属性“encode\u plus”

然而,我在火车上用这种方法对句子进行编码。加载经过训练的BERT模型后,是否有其他方法标记句子


Tags: from模型selfidinputoutputmodeldir