Skip to content

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

License

Notifications You must be signed in to change notification settings

RayJue/ChatGLM-Efficient-Tuning

 
 

Repository files navigation

ChatGLM Efficient Tuning

GitHub Repo stars GitHub Code License GitHub last commit GitHub pull request

Fine-tuning 🤖ChatGLM-6B model with 🤗PEFT.

[ English | 中文 ]

Changelog

[23/04/12] Now we support training from checkpoints! Use --checkpoint_dir to specify the checkpoint model to fine-tune from.

[23/04/11] Now we support training with combined datasets! Try --dataset dataset1,dataset2 argument for training with multiple datasets.

Datasets

Our script now supports the following datasets:

Please refer to config_data.py for details.

Fine-Tuning Methods

Our script now supports the following fine-tuning methods:

  • LoRA
    • Fine-tuning the low-rank adapters of the model.
  • P-Tuning V2
    • Fine-tuning the prefix encoder of the model.
  • Freeze
    • Fine-tuning the MLPs in the last n blocks of the model.

Requirement

  • Python 3.10 and PyTorch 2.0.0
  • 🤗Transformers, Datasets, and PEFT (0.3.0.dev0 is required)
  • protobuf, cpm_kernels, sentencepiece
  • jieba, rouge_chinese, nltk

And powerful GPUs!

Getting Started

Prepare Data (optional)

Please refer to data/example_dataset for checking the details about the format. You can either use a single .json file or a dataset loading script with multiple files to create a custom dataset.

Preparation (optional)

git clone https://github.com/hiyouga/ChatGLM-Efficient-Tuning.git
conda create -n cet python=3.10
conda activate cet
pip install -r requirements.txt

Fine-tuning

CUDA_VISIBLE_DEVICES=0 python finetune_chatglm.py \
    --do_train \
    --dataset alpaca_gpt4_zh \
    --finetuning_type lora \
    --output_dir output \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --max_train_samples 10000 \
    --learning_rate 5e-5 \
    --num_train_epochs 1.0 \
    --fp16

Distributed Fine-tuning with Multiple GPUs

accelerate launch python finetun_chatglm.py # arguments (same as above)

Note: distributed fine-tuning seems not compatible with the LoRA method.

Evaluation (BLEU and ROUGE_CHINESE)

CUDA_VISIBLE_DEVICES=0 python finetune_chatglm.py \
    --do_eval \
    --dataset alpaca_gpt4_zh \
    --checkpoint_dir output \
    --output_dir eval \
    --per_device_eval_batch_size 1 \
    --max_eval_samples 50 \
    --predict_with_generate

Inference

CUDA_VISIBLE_DEVICES=0 python infer_chatglm.py --checkpoint_dir output

Deploy the Fine-tuned Model

from utils import load_pretrained
from arguments import ModelArguments
model_args = ModelArguments(checkpoint_dir=path_to_checkpoint_dir)
model, tokenizer = load_pretrained(model_args)
model = model.half().cuda()
# model.generate, model.chat()...

Hardware Requirements

Fine-tune method Batch size Mode GRAM Speed
LoRA (r=8) 16 FP16 28GB 8ex/s
LoRA (r=8) 8 FP16 24GB 8ex/s
LoRA (r=8) 4 FP16 20GB 8ex/s
P-Tuning (p=16) 4 FP16 20GB 8ex/s
P-Tuning (p=16) 4 int8 16GB 8ex/s
P-Tuning (p=16) 4 int4 12GB 8ex/s
Freeze (l=3) 4 FP16 24GB 8ex/s

Note: r is the lora rank, p is the number of prefix tokens, l is the number of trainable layers, ex/s is the examples per second in training. The gradient_accumulation_steps is set to 1. All are evaluated on a single Tesla V100 (32G) GPU, they are approximated values and may vary in different GPUs.

Fine-tuning ChatGLM: A Case

Training Results

We use the whole alpaca_gpt4_zh dataset to fine-tune the ChatGLM model with LoRA (r=8) for one epoch, using the default hyper-parameters. The loss curve during training is presented below.

training loss

Evaluation Results

We select 100 instances in the alpaca_gpt4_zh dataset to evaluate the fine-tuned ChatGLM model and compute the BLEU and ROUGE scores. The results are presented below.

Score Original FZ (l=2) PT (p=16) LoRA (r=8)
BLEU-4 15.75 16.85 16.06 17.01 (+1.26)
Rouge-1 34.51 36.62 34.80 36.77 (+2.26)
Rouge-2 15.11 17.04 15.32 16.83 (+1.72)
Rouge-l 26.18 28.17 26.35 28.86 (+2.68)
Params (%) / 4.35% 0.06% 0.06%

FZ: freeze tuning, PT: P-Tuning V2 (we use pre_seq_len=16 for fair comparison with LoRA), Params: the percentange of trainable parameters.

Compared with Existing Implementations

  • THUDM/ChatGLM-6B
    • Official implementation of fine-tuning ChatGLM with P-Tuning v2 on the ADGEN dataset.
    • Our fine-tuning script is largely depend on it. We further implement the LoRA tuning method. Additionally, we dynamically pad the inputs to the longest sequence in the batch instead of the maximum length, to accelerate the fine-tuning.
  • mymusise/ChatGLM-Tuning
    • An unoffical implementation of fine-tuning ChatGLM with LoRA on the Stanford Alpaca dataset.
    • We borrowed some ideas from it. Our fine-tuning script integrates the data pre-processing part into the training procedure, so we need not generate a pre-processed dataset before training.
  • ssbuild/chatglm_finetuning
  • lich99/ChatGLM-finetune-LoRA
  • liucongg/ChatGLM-Finetuning
    • An unofficial implementation of fine-tuning ChatGLM with several methods including Freeze, LoRA and P-Tuning on the industrial dataset.
    • We are aim to incorporate more instruction-following datasets for fine-tuning the ChatGLM model.
  • yanqiangmiffy/InstructGLM
    • An unofficial implementation of fine-tuning ChatGLM that explores the ChatGLM's ability on the instruction-following datasets.
    • Our fine-tuning script integrates the data pre-processing part in to the training procedure.

TODO

License

This repository is licensed under the Apache-2.0 License.

Citation

If this work is helpful, please cite as:

@Misc{chatglm-efficient-tuning,
  title = {ChatGLM Efficient Tuning},
  author = {hiyouga},
  howpublished = {\url{https://github.com/hiyouga/ChatGLM-Efficient-Tuning}},
  year = {2023}
}

Acknowledgement

This repo benefits from ChatGLM-6B, ChatGLM-Tuning and yuanzhoulvpi2017/zero_nlp. Thanks for their wonderful works.

About

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.4%
  • Shell 1.6%