1. Fix the bug that occurred when enabling quantization during RM tra…

…ining that caused a compliance requirement 2. downgrade torch version to 1.13.1 to avoid unnecessary Installation 3. add some useful comments in readme 4. add a finetune example which enables quant and uses local base model
xsun15 · May 3, 2023 · 6d160c3 · 6d160c3
1 parent 6ce2e3f
commit 6d160c3
Show file tree

Hide file tree

Showing 5 changed files with 70 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -93,6 +93,17 @@ cd ChatGLM-Efficient-Tuning
 pip install -r requirements.txt
 ```
 
+If you want to enable LoRa or Freeze quantization on Windows, you need to install the Bitsandbytes library additionally.
+Because Bitsandbytes currently cannot directly support Windows, we used a pre-built package that currently only supports CUDA 11.6 and CUDA 11.7
+```
+pip install https://github.com/acpopescu/bitsandbytes/releases/download/v0.37.2-win.1/bitsandbytes-0.37.2-py3-none-any.whl
+```
+
+for linux user, just install directly
+```
+pip install bitsandbytes
+```
+
 ### Fine-tuning with a Single GPU
 
 ```bash
@@ -140,6 +151,7 @@ CUDA_VISIBLE_DEVICES=0 python src/train_rm.py \
     --fp16
 ```
 
+> The current default version uses the difference in score between the EOS tokens of the accept response and reject response as the learning reward
 ### Training with RLHF
 
 ```bash
@@ -224,6 +236,16 @@ model.eval()
 | Freeze (l=3)     |     4      | FP16 |  24GB  | 8ex/s |
 | Freeze (l=3)     |     4      | INT8 |  12GB  | 8ex/s |
 
+| Rm method       | Batch size | Mode | GRAM | Speed |
+|-----------------|------------| ---- |------|-------|
+| LoRA (r=8) + rm | 1          | INT8 | 11GB | -     |
+| LoRA (r=8) + rm | 4          | FP16 | 22GB | -     |
+
+| RLHF method      | Batch size | Mode | GRAM | Speed |
+|------------------|------------| ---- |------|-------|
+| LoRA (r=8) + ppo | 1          | INT8 | 12GB | -     |
+| LoRA (r=8) + ppo | 4          | FP16 | 23GB | -     |
+
 > Note: `r` is the lora rank, `p` is the number of prefix tokens, `l` is the number of trainable layers, `ex/s` is the examples per second at training. The `gradient_accumulation_steps` is set to `1`. All are evaluated on a single Tesla V100 (32G) GPU, they are approximated values and may vary in different GPUs.
 
 ## Fine-tuning ChatGLM: A Case

diff --git a/README_zh.md b/README_zh.md
@@ -97,6 +97,17 @@ cd ChatGLM-Efficient-Tuning
 pip install -r requirements.txt
 ```
 
+如果想在windows上开启lora或者freeze的量化, 需要额外装一下bitsandbytes库
+因为bitsandbytes目前不能直接支持windows，所以我们使用了一个预构建好的包，该包目前只支持 cuda11.6和cuda11.7
+```
+pip install https://github.com/acpopescu/bitsandbytes/releases/download/v0.37.2-win.1/bitsandbytes-0.37.2-py3-none-any.whl
+```
+
+如果是linux用户，直接安装即可
+```
+pip install bitsandbytes
+```
+
 ### 单 GPU 微调训练
 
 ```bash
@@ -144,6 +155,8 @@ CUDA_VISIBLE_DEVICES=0 python src/train_rm.py \
     --fp16
 ```
 
+> 目前默认版本使用accpect response 和reject response 的eos token的分数之差作为学习奖励
+
 ### RLHF 训练
 
 ```bash
@@ -229,6 +242,17 @@ model.eval()
 | Freeze (l=3)     |     4      | FP16 |  24GB  | 8ex/s |
 | Freeze (l=3)     |     4      | INT8 |  12GB  | 8ex/s |
 
+| 奖励模型方法          | Batch size | Mode | GRAM | Speed |
+|-----------------|------------| ---- |------|-------|
+| LoRA (r=8) + rm | 1          | INT8 | 11GB | -     |
+| LoRA (r=8) + rm | 4          | FP16 | 22GB | -     |
+
+| RLHF 训练      | Batch size | Mode | GRAM | Speed |
+|------------------|------------| ---- |------|-------|
+| LoRA (r=8) + ppo | 1          | INT8 | 12GB | -     |
+| LoRA (r=8) + ppo | 4          | FP16 | 23GB | -     |
+
+
 > 注：`r` 为LoRA 维数大小，`p` 为前缀词表大小，`l` 为微调层数，`ex/s` 为每秒训练的样本数。`gradient_accumulation_steps` 参数设置为 `1`。上述结果均来自于单个 Tesla V100 GPU，仅供参考。
 
 ## 微调 ChatGLM 的例子

diff --git a/examples/finetune_with_quant_and_local_model.sh b/examples/finetune_with_quant_and_local_model.sh
@@ -0,0 +1,19 @@
+#!/bin/bash
+
+CUDA_VISIBLE_DEVICES=0 python ../src/finetune.py \
+    --do_train \
+    --dataset alpaca_gpt4_zh \
+    --dataset_dir ../data \
+    --finetuning_type lora \
+    --output_dir path_to_checkpoint \
+    --overwrite_cache \
+    --per_device_train_batch_size 4 \
+    --gradient_accumulation_steps 4 \
+    --lr_scheduler_type cosine \
+    --logging_steps 10 \
+    --save_steps 1000 \
+    --learning_rate 5e-5 \
+    --num_train_epochs 1.0 \
+    --fp16 \
+    --quantization_bit 8 \
+    --model_name_or_path path_to_base_model
diff --git a/requirements.txt b/requirements.txt
@@ -1,4 +1,4 @@
-torch>=2.0.0
+torch>=1.13.1
 protobuf
 cpm_kernels
 sentencepiece

diff --git a/src/utils/common.py b/src/utils/common.py
@@ -224,7 +224,10 @@ def load_pretrained(
         if stage == "ppo": # load reward model
             model.pretrained_model.load_adapter(model_args.reward_model, "reward", is_trainable=False)
             load_valuehead_params(model, model_args.reward_model)
-
+        # Set the parameter _is_int8_training_enabled for the AutoModelForCausalLMWithValueHead model
+        # To meet the compliance requirements of the transformers library
+        if quantization == "hf" and model_args.quantization_bit == 8:
+            model._is_int8_training_enabled = True
     print_trainable_params(model)
 
     return model, tokenizer