使用DistilBERT生成文本句子

2024-05-13 04:21:26 发布

您现在位置:Python中文网/ 问答频道 /正文

你好, 我使用了非常棒的库式人脸转换器,用GPT2生成文本,效果非常好:

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
input_ids = torch.tensor(tokenizer.encode("Once upon a time there was")).unsqueeze(0)
model = GPT2LMHeadModel.from_pretrained("gpt2", pad_token_id=tokenizer.eos_token_id)
greedy_output = model.generate(input_ids, max_length=50)
print("Output:\n" + 100 * '-')
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))

我的问题是,现在我想做同样的事情,但是使用更小更简单的DistilmBERT模型,它也是104种语言的多语言模型,所以我想用西班牙语和英语以及这个更轻的模型生成文本

我试过这个

tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-multilingual-cased')
model = DistilBertForMaskedLM.from_pretrained('distilbert-base-multilingual-cased')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)  # Batch size 1
outputs = model(input_ids, masked_lm_labels=input_ids)
loss, prediction_scores = outputs[:2]

但我不确定这是否是正确的模式使用。一旦我得到了输出,我如何从中得到短语的延续

经过更多的测试,我可以使用distilgpt2很好地生成,问题是我想使用轻型多语言模型DistilmBERT(distilbert base multilingual cased)实现多语言生成,有什么提示吗

import torch
from transformers import *
from transformers import TFGPT2LMHeadModel, GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
input_ids = torch.tensor(tokenizer.encode("Once upon a time")).unsqueeze(0)
model = GPT2LMHeadModel.from_pretrained("distilgpt2", pad_token_id=tokenizer.eos_token_id)
greedy_output = model.generate(input_ids, max_length=50) #greedy search

sample_outputs = model.generate(
    input_ids,
    do_sample=True, 
    max_length=50, 
    top_k=50, 
    top_p=0.95, 
    temperature=1,
    num_return_sequences=3
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))`

谢谢你的帮助:)


Tags: samplefrom模型tokenidtrueidsinput
1条回答
网友
1楼 · 发布于 2024-05-13 04:21:26

我只是复制了LysandreJik给出的答案here

Unfortunately DistilmBERT can't be used for generation. This is due to the way the original BERT models were pre-trained, using masked language modeling (MLM). It therefore attends to both the left and right contexts (tokens on the left and right of the token you're trying to generate), while for generation the model only has access to the left context.

GPT-2 was trained with causal language modeling (CLM), which is why it can generate such coherent sequences. We implement the generation method only for CLM models, as MLM models do not generate anything coherent.

在文档中,您可以找到适合任务的型号

{a3}中的一个快速示例

相关问题 更多 >