Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-tuning best practices for qwen2.5-72b-instruct and qwen2-vl-72b-instruct. #2064

Open
Jintao-Huang opened this issue Sep 18, 2024 · 5 comments
Labels
good first issue Good for newcomers

Comments

@Jintao-Huang
Copy link
Collaborator

Jintao-Huang commented Sep 18, 2024

More docs:

qwen2-vl: https://github.com/modelscope/ms-swift/blob/main/docs/source/Multi-Modal/qwen2-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md

qwen1.5: https://github.com/modelscope/ms-swift/blob/main/docs/source/LLM/Qwen1.5%E5%85%A8%E6%B5%81%E7%A8%8B%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md

我们使用ms-swift对qwen2.5和qwen2-vl进行自我认知微调和图像OCR微调,并对微调后的模型进行推理。

在开始微调之前,请确保您的环境已正确安装

# 安装ms-swift.
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .[llm]

# qwen2-vl
# https://github.com/QwenLM/Qwen2-VL/issues/96
pip install git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830
# vllm加速
pip install vllm>=0.6.1

通常,大模型微调通常使用自定义数据集进行微调。在这里,我们将展示可直接运行的demo。

qwen2.5-72b-instruct

我们对Qwen2.5-72B-Instruct进行自我认知微调。

自我认知数据集:https://www.modelscope.cn/datasets/swift/self-cognition

通用混合数据集:

微调脚本:

# 实验环境:4 * A100
# 显存占用:4 * 70GB
NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
    --model_type qwen2_5-72b-instruct \
    --model_id_or_path qwen/Qwen2.5-72B-Instruct \
    --dataset qwen2-pro-en#500 qwen2-pro-zh#500 self-cognition#500 \
    --logging_steps 5 \
    --learning_rate 1e-4 \
    --output_dir output \
    --lora_target_modules ALL \
    --model_name 小黄 'Xiao Huang' \
    --model_author 魔搭 ModelScope \
    --system "You are a helpful assistant." \
    --deepspeed default-zero3

# 单卡A10/3090可运行的例子 (Qwen2.5-7B-Instruct)
# 显存占用:24GB
CUDA_VISIBLE_DEVICES=0 swift sft \
    --model_type qwen2_5-7b-instruct \
    --model_id_or_path qwen/Qwen2.5-7B-Instruct \
    --dataset qwen2-pro-en#500 qwen2-pro-zh#500 self-cognition#500 \
    --logging_steps 5 \
    --max_length 2048 \
    --learning_rate 1e-4 \
    --output_dir output \
    --lora_target_modules ALL \
    --model_name 小黄 'Xiao Huang' \
    --model_author 魔搭 ModelScope \
    --system "You are a helpful assistant."

自定义数据集文档可以查看:https://github.com/modelscope/ms-swift/blob/main/docs/source/Instruction/%E8%87%AA%E5%AE%9A%E4%B9%89%E4%B8%8E%E6%8B%93%E5%B1%95.md

微调显存消耗:

image

微调过程的loss可视化:

image

微调后推理脚本如下,这里的ckpt_dir需要修改为训练生成的last checkpoint文件夹。我们可以使用vLLM对merge后的checkpoint进行推理加速:

# 直接推理
CUDA_VISIBLE_DEVICES=0,1 swift infer \
    --ckpt_dir output/qwen2_5-72b-instruct/vx-xxx/checkpoint-xxx \

# merge-lora并使用vLLM进行推理加速
CUDA_VISIBLE_DEVICES=0,1 swift export \
    --ckpt_dir output/qwen2_5-72b-instruct/vx-xxx/checkpoint-xxx \
    --merge_lora true

CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer \
    --ckpt_dir output/qwen2_5-72b-instruct/vx-xxx/checkpoint-xxx-merged \
    --infer_backend vllm --max_model_len 8192 \
    --tensor_parallel_size 4

微调后模型对验证集进行推理的示例:

image

qwen2-vl-72b-instruct

我们对Qwen2-VL-72B-Instruct进行OCR微调。Grouding任务和视频微调的例子可以查看ms-swift文档:https://github.com/modelscope/ms-swift/blob/main/docs/source/Multi-Modal/qwen2-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md

微调数据集:https://modelscope.cn/datasets/AI-ModelScope/LaTeX_OCR
微调脚本:

# 实验环境:8 * A100
SIZE_FACTOR=8 MAX_PIXELS=602112 \
NPROC_PER_NODE=8 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
swift sft \
  --model_type qwen2-vl-72b-instruct \
  --model_id_or_path qwen/Qwen2-VL-72B-Instruct \
  --sft_type lora \
  --dataset latex-ocr-print#20000 \
  --deepspeed default-zero3

如果要使用自定义数据集,只需按以下方式进行指定:

# val_dataset可选,如果不指定,则会从dataset中切出一部分数据集作为验证集
  --dataset train.jsonl \
  --val_dataset val.jsonl \

自定义数据集格式:

{"query": "<image>55555", "response": "66666", "images": ["image_path"]}
{"query": "<image><image>eeeee", "response": "fffff", "history": [], "audios": ["image_path1", "image_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response1"], ["query2", "response2"]]}

微调显存消耗:

image

微调过程的loss可视化:(由于时间原因,这里只微调了250个steps)

image

微调后推理脚本如下,这里的ckpt_dir需要修改为训练生成的last checkpoint文件夹。我们可以使用vLLM对merge后的checkpoint进行推理加速:

# 直接推理
CUDA_VISIBLE_DEVICES=0,1 swift infer \
    --ckpt_dir output/qwen2-vl-72b-instruct/vx-xxx/checkpoint-xxx \
    --load_dataset_config true

# merge-lora并使用vLLM进行推理加速
CUDA_VISIBLE_DEVICES=0,1 swift export \
    --ckpt_dir output/qwen2-vl-72b-instruct/vx-xxx/checkpoint-xxx \
    --merge_lora true

CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer \
    --ckpt_dir output/qwen2-vl-72b-instruct/vx-xxx/checkpoint-xxx-merged \
    --load_dataset_config true --infer_backend vllm \
    --tensor_parallel_size 4 --max_model_len 16384

微调后模型对验证集进行推理的示例:

image

@llp1992
Copy link

llp1992 commented Sep 19, 2024

qwen2-vl支持多图多伦对话训练吗?

@etemiz
Copy link

etemiz commented Sep 19, 2024

can I train 72b with 2A6000? (248GB)

@Jintao-Huang
Copy link
Collaborator Author

qwen2-vl支持多图多伦对话训练吗?

支持的

@Jintao-Huang
Copy link
Collaborator Author

can I train 72b with 2_A6000? (2_48GB)

maybe qlora

# GPU Memory: 2 * 28GB
SIZE_FACTOR=8 MAX_PIXELS=602112 \
CUDA_VISIBLE_DEVICES=0,1 \
swift sft \
  --model_type qwen2-vl-72b-instruct-gptq-int4 \
  --model_id_or_path qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4 \
  --sft_type lora \
  --dataset latex-ocr-print#20000

@Jintao-Huang
Copy link
Collaborator Author

lora & device_map

# GPU Memory: 2 * 75GB
SIZE_FACTOR=8 MAX_PIXELS=602112 \
CUDA_VISIBLE_DEVICES=0,1 \
swift sft \
  --model_type qwen2-vl-72b-instruct \
  --model_id_or_path qwen/Qwen2-VL-72B-Instruct \
  --sft_type lora \
  --dataset latex-ocr-print#20000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants