如何增加Mistral 7B模型中隐含线性层的宽度?
安装完成后
!pip install -U bitsandbytes
!pip install -U transformers
!pip install -U peft
!pip install -U accelerate
!pip install -U trl
接下来,我们需要一些模板来加载Mistral模型:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,HfArgumentParser,TrainingArguments,pipeline, logging
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model
from datasets import load_dataset
from trl import SFTTrainer
import torch
bnb_config = BitsAndBytesConfig(
load_in_4bit= True,
bnb_4bit_quant_type= "nf4",
bnb_4bit_compute_dtype= torch.bfloat16,
bnb_4bit_use_double_quant= False,
)
base_model="mistralai/Mistral-7B-v0.1"
model = AutoModelForCausalLM.from_pretrained(
base_model,
quantization_config=bnb_config,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
model.config.use_cache = False # silence the warnings
model.config.pretraining_tp = 1
model.gradient_checkpointing_enable()
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.padding_side = 'right'
tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_eos_token = True
tokenizer.add_bos_token, tokenizer.add_eos_token
我们可以看到模型的层次结构:
>>> model
[输出]:
MistralForCausalLM(
(model): MistralModel(
(embed_tokens): Embedding(32000, 4096)
(layers): ModuleList(
(0-31): 32 x MistralDecoderLayer(
(self_attn): MistralAttention(
(q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
(k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
(v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
(o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
(rotary_emb): MistralRotaryEmbedding()
)
(mlp): MistralMLP(
(gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
(up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
(down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
(act_fn): SiLU()
)
(input_layernorm): MistralRMSNorm()
(post_attention_layernorm): MistralRMSNorm()
)
)
(norm): MistralRMSNorm()
)
(lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)
有没有办法增加Linear4bit层的宽度?
比如说,如果我们想让模型多接收800个隐藏节点的层,来获得
MistralForCausalLM(
(model): MistralModel(
(embed_tokens): Embedding(32000, 4896)
(layers): ModuleList(
(0-31): 32 x MistralDecoderLayer(
(self_attn): MistralAttention(
(q_proj): Linear4bit(in_features=4896, out_features=4896, bias=False)
(k_proj): Linear4bit(in_features=4896, out_features=1024, bias=False)
(v_proj): Linear4bit(in_features=4896, out_features=1024, bias=False)
(o_proj): Linear4bit(in_features=4896, out_features=4896, bias=False)
(rotary_emb): MistralRotaryEmbedding()
)
(mlp): MistralMLP(
(gate_proj): Linear4bit(in_features=4896, out_features=14336, bias=False)
(up_proj): Linear4bit(in_features=4896, out_features=14336, bias=False)
(down_proj): Linear4bit(in_features=14336, out_features=4896, bias=False)
(act_fn): SiLU()
)
(input_layernorm): MistralRMSNorm()
(post_attention_layernorm): MistralRMSNorm()
)
)
(norm): MistralRMSNorm()
)
(lm_head): Linear(in_features=4896, out_features=32000, bias=False)
)
注意:如果Linear4bit
中的额外隐藏节点是随机初始化的,那也是可以的。
1 个回答
2
你可以在 MistralConfig
里修改 hidden_size
这个参数,然后在加载模型的时候把它作为一个参数传进去。文档链接
这样做的话,模型不会加载默认参数下的预训练权重,你可能需要从头开始重新训练这个模型。下面是一个修改过的基本代码:
from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer, BitsAndBytesConfig,HfArgumentParser,TrainingArguments,pipeline, logging
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model
from datasets import load_dataset
from trl import SFTTrainer
import torch
bnb_config = BitsAndBytesConfig(
load_in_4bit= True,
bnb_4bit_quant_type= "nf4",
bnb_4bit_compute_dtype= torch.bfloat16,
bnb_4bit_use_double_quant= False,
)
base_model="mistralai/Mistral-7B-v0.1"
config = AutoConfig.from_pretrained(base_model, hidden_size=4896)
model = AutoModelForCausalLM.from_pretrained(
base_model,
config=config,
quantization_config=bnb_config,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
ignore_mismatched_sizes=True # Since we make changes to architecture sizes.
)
model.config.use_cache = False # silence the warnings
model.config.pretraining_tp = 1
model.gradient_checkpointing_enable()
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.padding_side = 'right'
tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_eos_token = True
tokenizer.add_bos_token, tokenizer.add_eos_token
注意:当你改变模型的架构大小时,初始化 AutoModelForCausalLM.from_pretrained()
时一定要加上 ignore_mismatched_sizes=True
这个参数。