MoE-LoRA: Mixture-of-Experts Adaptation of LLM using Parameter Efficient Method

This implementation adapts a LLama-like model (like Mistral 7B) to a Mixture-of-Experts model (like Mixstral 8x7B), using Parameter Efficient finetuning (LoRA). LoRA adapters are injected in the MLP to mimic finetuning of Mixstral.

from transformers import AutoModelForCausalLM
from lora_moe import LoraMoeConfig, LoraMoeModel

model = AutoModelForCausalLM.from_pretrained(
    config.base_model_id,
    quantization_config=bnb_config,
)

model_config = LoraMoeConfig.from_pretrained(config.base_model_id)
model_config.experts_rank = 8 # rank of LoRA experts
model_config.experts_scale = 1 # LoRA scale
model_config.num_experts_per_tok = 2 # number of expert to use for each token
model_config.num_local_experts = 8 # numer of LoRA experts to initialize
model_config.output_router_logits = True

moe_model = LoraMoeModel(model, model_config) # injects MoE-LoRA adapters in the MLP
moe_model.make_experts_trainable() # train only the adapters

git clone https://github.com/maidacundo/MoE-LoRA.git
cd MoE-LoRA/
pip install -r requirements.txt
wandb login
huggingface-cli login
accelerate launch train_openassistant.py

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
lora_moe		lora_moe
training		training
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
train_openassistant.py		train_openassistant.py
train_wikipedia.py		train_wikipedia.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoE-LoRA: Mixture-of-Experts Adaptation of LLM using Parameter Efficient Method

About

Releases

Packages

Languages

maidacundo/MoE-LoRA

Folders and files

Latest commit

History

Repository files navigation

MoE-LoRA: Mixture-of-Experts Adaptation of LLM using Parameter Efficient Method

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages