尝试从微调的Mistral7B LLM生成摘要时出现意外的关键字参数'use_dora
我使用LoRA技术对Mistral7B这个大型语言模型进行了微调,配置为16位,并使用了Hugging Face提供的samsum训练集。我的目的是让这个微调后的模型接收一段对话,然后生成一个总结。
这是我的训练脚本:
# set the train & validation data set
from datasets import load_dataset
train_dataset = load_dataset('json', data_files='/data/datasets/summarisation/samsum/train.json', split='train')
eval_dataset = load_dataset('json', data_files='/data/datasets/summarisation/samsum/validation.json', split='train')
# Set up the Accelerator. I'm not sure if we really need this for a QLoRA given its description (I have to read more about it) but it seems it can't hurt, and it's helpful to have the code for future reference. You can always comment out the accelerator if you want to try without.
from accelerate import FullyShardedDataParallelPlugin, Accelerator
from torch.distributed.fsdp.fully_sharded_data_parallel import FullOptimStateDictConfig, FullStateDictConfig
fsdp_plugin = FullyShardedDataParallelPlugin(
state_dict_config=FullStateDictConfig(offload_to_cpu=True, rank0_only=False),
optim_state_dict_config=FullOptimStateDictConfig(offload_to_cpu=True, rank0_only=False),
)
accelerator = Accelerator(fsdp_plugin=fsdp_plugin)
# Let's use Weights & Biases to track our training metrics. You'll need to apply an API key when prompted. Feel free to skip this if you'd like, and just comment out the `wandb` parameters in the `Trainer` definition below.
import wandb, os
wandb.login()
wandb_project = "mistral-samsun-finetune"
if len(wandb_project) > 0:
os.environ["WANDB_PROJECT"] = wandb_project
# Formatting prompts
# Then create a `formatting_func` to structure training examples as prompts.
def formatting_func(example):
text = f"### Dialog: {example['dialogue']}\n ### Summary: {example['summary']}"
return text
### 2. Load Base Model
# Let's now load Mistral - mistralai/Mistral-7B-v0.1 - using 4-bit quantization!
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, AutoConfig
base_model_id = "mistralai/Mistral-7B-v0.1"
# 4 bit config
#bnb_config = BitsAndBytesConfig(
# load_in_4bit=True,
# bnb_4bit_use_double_quant=True,
# bnb_4bit_quant_type="nf4",
# bnb_4bit_compute_dtype=torch.bfloat16
#)
#model = AutoModelForCausalLM.from_pretrained(base_model_id, quantization_config=bnb_config, device_map="auto")
#16 bit
config = AutoConfig.from_pretrained(base_model_id)
config.quantization = "int8" # Set quantization to int8 for 16-bit precision
model = AutoModelForCausalLM.from_pretrained(base_model_id, config=config)
### 3. Tokenization
#Set up the tokenizer. Add padding on the left as it [makes training use less memory](https://ai.stackexchange.com/questions/41485/while-fine-tuning-a-decoder-only-llm-like-llama-on-chat-dataset-what-kind-of-pa).
#For `model_max_length`, it's helpful to get a distribution of your data lengths. Let's first tokenize without the truncation/padding, so we can get a length distribution.
tokenizer = AutoTokenizer.from_pretrained(
base_model_id,
padding_side="left",
add_eos_token=True,
add_bos_token=True,
)
tokenizer.pad_token = tokenizer.eos_token
def generate_and_tokenize_prompt(prompt):
return tokenizer(formatting_func(prompt))
# return tokenizer(prompt, padding="max_length", truncation=True)
# Reformat the prompt and tokenize each sample:
tokenized_train_dataset = train_dataset.map(generate_and_tokenize_prompt)
tokenized_val_dataset = eval_dataset.map(generate_and_tokenize_prompt)
# Let's get a distribution of our dataset lengths, so we can determine the appropriate `max_length` for our input tensors.
import matplotlib.pyplot as plt
def plot_data_lengths(tokenized_train_dataset, tokenized_val_dataset):
lengths = [len(x['input_ids']) for x in tokenized_train_dataset]
lengths += [len(x['input_ids']) for x in tokenized_val_dataset]
print(len(lengths))
# Plotting the histogram
plt.figure(figsize=(10, 6))
plt.hist(lengths, bins=20, alpha=0.7, color='blue')
plt.xlabel('Length of input_ids')
plt.ylabel('Frequency')
plt.title('Distribution of Lengths of input_ids')
plt.show()
plot_data_lengths(tokenized_train_dataset, tokenized_val_dataset)
#From here, you can choose where you'd like to set the max_length to be. You can truncate and pad training examples to fit them to your chosen size. Be aware that choosing a larger max_length has its compute tradeoffs.
# I'm using my personal notes to train the model, and they vary greatly in length. I spent some time cleaning the dataset so the samples were about the same length, cutting up individual notes if needed, but being sure to not cut in the middle of a word or sentence.
# Now let's tokenize again with padding and truncation, and set up the tokenize function to make labels and input_ids the same. This is basically what self-supervised fine-tuning is.
#max_length = 512 # This was an appropriate max length for my dataset
#We have a dynamic max length
max_length = max(max(len(x['input_ids']) for x in tokenized_train_dataset), max(len(x['input_ids']) for x in tokenized_val_dataset))
print(f"Max length: {max_length}")
wandb.init()
wandb.log({"Max length": max_length})
def generate_and_tokenize_prompt2(prompt):
result = tokenizer(
formatting_func(prompt),
truncation=True,
max_length=max_length,
padding="max_length",
)
result["labels"] = result["input_ids"].copy()
return result
tokenized_train_dataset = train_dataset.map(generate_and_tokenize_prompt2)
tokenized_val_dataset = eval_dataset.map(generate_and_tokenize_prompt2)
# Check that `input_ids` is padded on the left with the `eos_token` (2) and there is an `eos_token` 2 added to the end, and the prompt starts with a `bos_token` (1).
print(tokenized_train_dataset[1]['input_ids'])
# Now all the samples should be the same length, `max_length`.
plot_data_lengths(tokenized_train_dataset, tokenized_val_dataset)
# Set Up LoRA
# Now, to start our fine-tuning, we have to apply some preprocessing to the model to prepare it for training. For that use the prepare_model_for_kbit_training method from PEFT.
from peft import prepare_model_for_kbit_training
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)
def print_trainable_parameters(model):
# """
# Prints the number of trainable parameters in the model.
# """
trainable_params = 0
all_param = 0
for _, param in model.named_parameters():
all_param += param.numel()
if param.requires_grad:
trainable_params += param.numel()
print(
f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
)
# T
#Let's print the model to examine its layers, as we will apply QLoRA to all the linear layers of the model. Those layers are q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, and lm_head.
print(model)
#Here we define the LoRA config.
#`r` is the rank of the low-rank matrix used in the adapters, which thus controls the number of parameters trained. A higher rank will allow for more expressivity, but there is a compute tradeoff.
#`alpha` is the scaling factor for the learned weights. The weight matrix is scaled by `alpha/r`, and thus a higher value for `alpha` assigns more weight to the LoRA activations.
#The values used in the QLoRA paper were `r=64` and `lora_alpha=16`, and these are said to generalize well, but we will use `r=32` and `lora_alpha=64` so that we have more emphasis on the new fine-tuned data while also reducing computational complexity.
from peft import LoraConfig, get_peft_model
config = LoraConfig(
r=32,
lora_alpha=64,
target_modules=[
# "q_proj",
# "k_proj",
# "v_proj",
# "o_proj",
# "gate_proj",
# "up_proj",
# "down_proj",
"lm_head",
],
bias="none",
lora_dropout=0.05, # Conventional
task_type="CAUSAL_LM",
)
#apply Lora
model = get_peft_model(model, config)
print_trainable_parameters(model)
#See how the model looks different now, with the LoRA adapters added:
print(model)
#5 Run Training!
#Overfitting is when the validation loss goes up (bad) while the training loss goes down significantly, meaning the model is learning the training set really well, but is unable to generalize to new datapoints.
# In most cases, this is not desired, but since I am just playing around with a model to generate outputs like my journal entries, I was fine with a moderate amount of overfitting.
#With that said, a note on training: you can set the max_steps to be high initially, and examine at what step your model's performance starts to degrade.
#There is where you'll find a sweet spot for how many steps to perform. For example, say you start with 1000 steps, and find that at around 500 steps the model starts overfitting, as described above.
#Therefore, 500 steps would be yt spot, so you would use the checkpoint-500 model repo in your output dir (mistral-journal-finetune) as your final model in step 6 below.
#If you're just doing something for fun like I did and are OK with overfitting, you can try different checkpoint versions with different degrees of overfitting.
#You can interrupt the process via Kernel -> Interrupt Kernel in the top nav bar once you realize you didn't need to train anymore.
if torch.cuda.device_count() > 1: # If more than 1 GPU
model.is_parallelizable = True
model.model_parallel = True
model = accelerator.prepare_model(model)
import transformers
from datetime import datetime
project = "finetune-lora"
base_model_name = "mistral"
run_name = base_model_name + "-" + project
trainer = transformers.Trainer(
model=model,
train_dataset=tokenized_train_dataset,
eval_dataset=tokenized_val_dataset,
args=transformers.TrainingArguments(
output_dir=output_dir,
warmup_steps=500,
per_device_train_batch_size=2,
gradient_accumulation_steps=1,
gradient_checkpointing=True,
max_steps=10000,
learning_rate=2.5e-5, # Want a small lr for finetuning
bf16=True,
optim="paged_adamw_8bit",
logging_steps=25, # When to start reporting loss
logging_dir="./logs", # Directory for storing logs
save_strategy="steps", # Save the model checkpoint every logging step
save_steps=25, # Save checkpoints every 50 steps
evaluation_strategy="steps", # Evaluate the model every logging step
eval_steps=25, # Evaluate and save checkpoints every 50 steps
do_eval=True, # Perform evaluation at the end of training
report_to="wandb", # Comment this out if you don't want to use weights & baises
run_name=f"{run_name}-{datetime.now().strftime('%Y-%m-%d-%H-%M')}" # Name of the W&B run (optional)
),
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False # silence the warnings. Please re-enable for inference!
trainer.train()
上面的脚本运行得很正常。现在我有第二个脚本,我给它一个文件路径,这个文件里包含了对话,它应该返回一个总结。这是我的第二个脚本:
import sys
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
from peft import PeftModel
def read_file(file_path):
try:
with open(file_path, 'r') as file:
file_content = file.read()
return file_content
except FileNotFoundError:
print(f"The file at path '{file_path}' was not found.")
except Exception as e:
print(f"An error occurred: {e}")
return None
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python script.py <file_path>")
sys.exit(1)
file_path = sys.argv[1]
content = read_file(file_path)
if content is None:
sys.exit(1)
print(f"File with dialogue '{file_path}':")
# Load Base Model
base_model_id = "mistralai/Mistral-7B-v0.1"
config = AutoConfig.from_pretrained(base_model_id)
#model = AutoModelForCausalLM.from_pretrained(base_model_id, config=config)
config.quantization = "int8" # Set quantization to int8 for 16-bit precision
model = AutoModelForCausalLM.from_pretrained(base_model_id, config=config)
tokenizer = AutoTokenizer.from_pretrained(base_model_id, add_bos_token=True, trust_remote_code=True)
# Load the LoRA adapter from the appropriate checkpoint directory
new_checkpoint_path = "mistral-finetune-lora16/checkpoint-10000"
# Since PeftModel.from_pretrained expects only one argument, we need to pass the model and the checkpoint path separately
ft_model = PeftModel.from_pretrained(base_model_id, new_checkpoint_path)
# Generate output using the loaded model and tokenizer
base_prompt = " ### Dialog: \n ### Summary: #"
eval_prompt = f"{base_prompt[:13]}{content}{base_prompt[13:]}"
model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")
ft_model.eval()
with torch.no_grad():
output = ft_model.generate(**model_input, max_new_tokens=100, repetition_penalty=1.15)[0]
decoded_output = tokenizer.decode(output, skip_special_tokens=True)
print(decoded_output)
当我运行第二个脚本时,它出现了这个错误:
Traceback (most recent call last):
File "run-mistral-lora-samsun.py", line 43, in <module>
ft_model = PeftModel.from_pretrained(base_model_id, new_checkpoint_path)
File "/bigdata/usr/src/mistral-train-full-samsum/lib/python3.8/site-packages/peft/peft_model.py", line 325, in from_pretrained
config = PEFT_TYPE_TO_CONFIG_MAPPING[
File "/bigdata/usr/src/mistral-train-full-samsum/lib/python3.8/site-packages/peft/config.py", line 152, in from_pretrained
return cls.from_peft_type(**kwargs)
File "/bigdata/usr/src/mistral-train-full-samsum/lib/python3.8/site-packages/peft/config.py", line 119, in from_peft_type
return config_cls(**kwargs)
TypeError: __init__() got an unexpected keyword argument 'use_dora'
有没有人知道为什么会这样?
我想说明的是,我用QLoRa(4位)做过同样的事情,结果是正常的。
Chat GPT的建议是:我尝试这样初始化ft_model:
ft_model = PeftModel.from_pretrained("lora", new_checkpoint_path, model=model)
但我得到了一个不同的异常信息:
Traceback (most recent call last):
File "run-mistral-lora-samsun.py", line 44, in <module>
ft_model = PeftModel.from_pretrained("lora", new_checkpoint_path, model=model)
TypeError: from_pretrained() got multiple values for argument 'model'
1 个回答
2
我也遇到过同样的问题,简单地把peft库升级到最新版本就解决了我的问题。