Skip to content

Commit

Permalink
Support freeze_vit (modelscope#1880)
Browse files Browse the repository at this point in the history
  • Loading branch information
Jintao-Huang committed Aug 31, 2024
1 parent 018bd8d commit 9f3f65d
Show file tree
Hide file tree
Showing 13 changed files with 108 additions and 43 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ You can contact us and communicate with us by adding our group:
- 2023.12.18: Support VLLM for inference acceleration.
- 2023.12.15: Support deepseek, deepseek-coder series: deepseek-7b, deepseek-7b-chat, deepseek-67b, deepseek-67b-chat, openbuddy-deepseek-67b-chat, deepseek-coder-1_3b, deepseek-coder-1_3b-instruct, deepseek-coder-6_7b, deepseek-coder-6_7b-instruct, deepseek-coder-33b, deepseek-coder-33b-instruct.
- 2023.12.13: Support mistral-7b-instruct-v2, [mixtral-moe-7b](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/mixtral_7b_moe), [mixtral-moe-7b-instruct](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/mixtral_7b_moe_instruct).
- 2023.12.09: Support `freeze_parameters` parameter as a compromise between lora and full-parameter training. Corresponding sh can be found in [full_freeze_ddp](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen_7b_chat/full_freeze_ddp). Support `disable_tqdm`, `lazy_tokenize`, `preprocess_num_proc` parameters, see [command line arguments](https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E5%91%BD%E4%BB%A4%E8%A1%8C%E5%8F%82%E6%95%B0.md) for details.
- 2023.12.09: Support `freeze_parameters_ratio` parameter as a compromise between lora and full-parameter training. Corresponding sh can be found in [full_freeze_ddp](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen_7b_chat/full_freeze_ddp). Support `disable_tqdm`, `lazy_tokenize`, `preprocess_num_proc` parameters, see [command line arguments](https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E5%91%BD%E4%BB%A4%E8%A1%8C%E5%8F%82%E6%95%B0.md) for details.
- 2023.12.08: Support [sus-34b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/sus_34b_chat), support yi-6b-200k, yi-34b-200k.
- 2023.12.07: Support [Multi-Node DDP training](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E5%BE%AE%E8%B0%83%E6%96%87%E6%A1%A3.md#%E4%BD%BF%E7%94%A8cli).
- 2023.12.05: Support models: zephyr-7b-beta-chat, openbuddy-zephyr-7b-chat. Support datasets: hc3-zh, hc3-en.
Expand Down
2 changes: 1 addition & 1 deletion README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ SWIFT具有丰富全面的文档,请查看我们的文档网站:
- 2023.12.18: 支持VLLM进行推理加速.
- 2023.12.15: 支持deepseek, deepseek-coder系列: deepseek-7b, deepseek-7b-chat, deepseek-67b, deepseek-67b-chat, openbuddy-deepseek-67b-chat, deepseek-coder-1_3b, deepseek-coder-1_3b-instruct, deepseek-coder-6_7b, deepseek-coder-6_7b-instruct, deepseek-coder-33b, deepseek-coder-33b-instruct.
- 2023.12.13: 支持mistral-7b-instruct-v2, [mixtral-moe-7b](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/mixtral_7b_moe), [mixtral-moe-7b-instruct](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/mixtral_7b_moe_instruct).
- 2023.12.09: 支持`freeze_parameters`参数, 作为lora和全参数训练的折中方案. 对应的sh可以查看[full_freeze_ddp](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen_7b_chat/full_freeze_ddp). 支持`disable_tqdm`, `lazy_tokenize`, `preprocess_num_proc`参数, 具体可以查看[命令行参数](https://github.com/modelscope/swift/blob/main/docs/source/LLM/命令行参数.md).
- 2023.12.09: 支持`freeze_parameters_ratio`参数, 作为lora和全参数训练的折中方案. 对应的sh可以查看[full_freeze_ddp](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen_7b_chat/full_freeze_ddp). 支持`disable_tqdm`, `lazy_tokenize`, `preprocess_num_proc`参数, 具体可以查看[命令行参数](https://github.com/modelscope/swift/blob/main/docs/source/LLM/命令行参数.md).
- 2023.12.08: 支持[sus-34b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/sus_34b_chat), 支持yi-6b-200k, yi-34b-200k.
- 2023.12.07: 支持[Multi-Node DDP训练](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E5%BE%AE%E8%B0%83%E6%96%87%E6%A1%A3.md#%E4%BD%BF%E7%94%A8cli).
- 2023.12.05: 支持模型: zephyr-7b-beta-chat, openbuddy-zephyr-7b-chat. 支持数据集: hc3-zh, hc3-en.
Expand Down
Binary file added docs/resources/qwen2-vl/ocr_result.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 4 additions & 2 deletions docs/source/LLM/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,10 @@
- `--full_determinism`: 固定所有的随机性, 默认值`False`.
- `--auto_find_batch_size`: 根据显存值自定找到batch_size, 默认值`False`.
- `--streaming`: 是否使用流式数据处理, 默认值`False`.
- `--freeze_parameters`: 当sft_type指定为'full'时, 将模型最底部的参数进行freeze. 指定范围为0. ~ 1., 默认为`0.`. 该参数提供了lora与全参数微调的折中方案.
- `--additional_trainable_parameters`: 作为freeze_parameters的补充, 只有在sft_type指定为'full'才允许被使用, 默认为`[]`. 例如你如果想训练50%的参数的情况下想额外训练embedding层, 你可以设置`--freeze_parameters 0.5 --additional_trainable_parameters transformer.wte`, 所有以`transformer.wte`开头的parameters都会被激活. 你也可以设置`--freeze_parameters 1 --additional_trainable_parameters xxx`来自定义可以训练的层.
- `--freeze_parameters`: 当sft_type指定为'full'时, 将以freeze_parameters为前缀的层进行freeze. 默认为`[]`. 例如: `--freeze_parameters visual`.
- `--freeze_vit`: 当sft_type指定为'full', 且训练的是多模态模型时, 可以通过将该参数设置为`True`来冻结vit的参数. 默认指为`False`.
- `--freeze_parameters_ratio`: 当sft_type指定为'full'时, 将模型最底部的参数进行freeze. 指定范围为0. ~ 1., 默认为`0.`. 该参数提供了lora与全参数微调的折中方案.
- `--additional_trainable_parameters`: 作为freeze_parameters的补充, 只有在sft_type指定为'full'才允许被使用, 默认为`[]`. 例如你如果想训练50%的参数的情况下想额外训练embedding层, 你可以设置`--freeze_parameters_ratio 0.5 --additional_trainable_parameters transformer.wte`, 所有以`transformer.wte`开头的parameters都会被激活. 你也可以设置`--freeze_parameters_ratio 1 --additional_trainable_parameters xxx`来自定义可以训练的层.
- `--tuner_backend`: 表示lora, qlora的后端支持, 默认是`'peft'`. 你可以选择的值包括: 'swift', 'peft', 'unsloth'.
- `--template_type`: 表示使用的对话模板的类型, 默认是`'AUTO'`, 即根据`model_type`查找`MODEL_MAPPING`中的`template`. 可以选择的`template_type`可以查看`TEMPLATE_MAPPING.keys()`.
- `--output_dir`: 表示ckpt存储的目录, 默认是`'output'`. 我们会在该目录后拼接`model_type`和微调版本号. 方便用户对不同模型进行多次对比实验, 而不需要改变`output_dir`命令行参数. 如果不需要拼接这些内容, 你需要额外指定参数`--add_output_dir_suffix false`.
Expand Down
18 changes: 16 additions & 2 deletions docs/source/Multi-Modal/qwen2-vl最佳实践.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,8 +164,22 @@ SIZE_FACTOR=8 MAX_PIXELS=602112 CUDA_VISIBLE_DEVICES=0 swift sft \
--model_id_or_path qwen/Qwen2-VL-7B-Instruct \
--sft_type lora \
--dataset latex-ocr-print#20000

# 全参数训练并freeze vit
# GPU Memory: 4 * 60GB
CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
--model_type qwen2-vl-7b-instruct \
--model_id_or_path qwen/Qwen2-VL-7B-Instruct \
--sft_type full \
--freeze_vit true \
--deepspeed default-zero2 \
--dataset latex-ocr-print#20000
```

微调后模型对验证集进行推理的示例(只训练了200个step):

![推理效果](../../resources/qwen2-vl/ocr_result.png)

### 图像描述微调

我们使用 coco-en-mini 数据集进行微调,该数据集的任务是对图片内容进行描述。您可以在 modelscope 上找到该数据集: [https://modelscope.cn/datasets/modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption)
Expand Down Expand Up @@ -198,7 +212,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
![显存占用](../../resources/qwen2-vl/1.png)


训练损失图(时间原因,只训练了200个step):
训练损失图(只训练了200个step):

![训练损失](../../resources/qwen2-vl/loss.png)

Expand Down Expand Up @@ -265,5 +279,5 @@ NFRAMES=24 MAX_PIXELS=100352 CUDA_VISIBLE_DEVICES=0 swift infer \
--load_dataset_config true --merge_lora true
```

微调后模型对验证集进行推理的示例(时间原因,只训练了50个step):
微调后模型对验证集进行推理的示例(只训练了50个step):
![推理效果](../../resources/qwen2-vl/4.png)
6 changes: 4 additions & 2 deletions docs/source_en/LLM/Command-line-parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,10 @@
- `--full_determinism`: Fix all the values in training, default `False`.
- `--auto_find_batch_size`: Auto find batch size according to the GPU memory, default `False`.
- `--streaming`: Whether to use iterable dataset, Default `False`.
- `--freeze_parameters`: When sft_type is set to 'full', freeze the bottommost parameters of the model. Range is 0. ~ 1., default is `0.`. This provides a compromise between lora and full fine-tuning.
- `--additional_trainable_parameters`: In addition to freeze_parameters, only allowed when sft_type is 'full', default is `[]`. For example, if you want to train embedding layer in addition to 50% of parameters, you can set `--freeze_parameters 0.5 --additional_trainable_parameters transformer.wte`, all parameters starting with `transformer.wte` will be activated. You can also set `--freeze_parameters 1 --additional_trainable_parameters xxx` to customize the trainable layers.
- `--freeze_parameters`: When sft_type is specified as 'full', the layers prefixed with freeze_parameters will be frozen. The default value is `[]`. For example: `--freeze_parameters visual`.
- `--freeze_vit`: When sft_type is set to 'full' and a multimodal model is being trained, the parameters of vit can be frozen by setting this parameter to True. The default value is `False`.
- `--freeze_parameters_ratio`: When sft_type is set to 'full', freeze the bottommost parameters of the model. Range is 0. ~ 1., default is `0.`. This provides a compromise between lora and full fine-tuning.
- `--additional_trainable_parameters`: In addition to freeze_parameters, only allowed when sft_type is 'full', default is `[]`. For example, if you want to train embedding layer in addition to 50% of parameters, you can set `--freeze_parameters_ratio 0.5 --additional_trainable_parameters transformer.wte`, all parameters starting with `transformer.wte` will be activated. You can also set `--freeze_parameters_ratio 1 --additional_trainable_parameters xxx` to customize the trainable layers.
- `--tuner_backend`: Backend support for lora, qlora, default is `'peft'`. Options include: 'swift', 'peft', 'unsloth'.
- `--template_type`: Type of dialogue template used, default is `'AUTO'`, i.e. look up `template` in `MODEL_MAPPING` based on `model_type`. Available `template_type` options can be found in `TEMPLATE_MAPPING.keys()`.
- `--output_dir`: Directory to store ckpt, default is `'output'`. We will append `model_type` and fine-tuning version number to this directory, allowing users to do multiple comparative experiments on different models without changing the `output_dir` command line argument. If you don't want to append this content, specify `--add_output_dir_suffix false`.
Expand Down
18 changes: 16 additions & 2 deletions docs/source_en/Multi-Modal/qwen2-vl-best-practice.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,10 @@ SIZE_FACTOR=8 MAX_PIXELS=602112 CUDA_VISIBLE_DEVICES=0 swift sft \
--dataset latex-ocr-print#20000
```

Example of the model performing inference on the validation set after fine-tuning (only 200 steps were trained):

![inference result](../../resources/qwen2-vl/ocr_result.png)

### Image Description Fine-tuning

We fine-tune using the coco-en-mini dataset, which aims to describe the content of images. You can find this dataset on ModelScope: [https://modelscope.cn/datasets/modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption)
Expand All @@ -164,6 +168,16 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
--sft_type lora \
--dataset coco-en-mini#20000 \
--deepspeed default-zero2

# Full parameter training and freezing ViT
# GPU Memory: 4 * 60GB
CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
--model_type qwen2-vl-7b-instruct \
--model_id_or_path qwen/Qwen2-VL-7B-Instruct \
--sft_type full \
--freeze_vit true \
--deepspeed default-zero2 \
--dataset latex-ocr-print#20000
```

To use a custom dataset, simply specify it as follows:
Expand All @@ -186,7 +200,7 @@ GPU Memory Usage:
![GPU Memory Usage](../../resources/qwen2-vl/1.png)


Training loss (only 200 steps were trained due to time constraints):
Training loss (only 200 steps were trained):

![train loss](../../resources/qwen2-vl/loss.png)

Expand Down Expand Up @@ -253,5 +267,5 @@ NFRAMES=24 MAX_PIXELS=100352 CUDA_VISIBLE_DEVICES=0 swift infer \
--load_dataset_config true --merge_lora true
```

Example of the model performing inference on the validation set after fine-tuning: (only 50 steps were trained due to time constraints)
Example of the model performing inference on the validation set after fine-tuning: (only 50 steps were trained)
![inference result](../../resources/qwen2-vl/4.png)
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,6 @@ swift sft \
--use_flash_attn true \
--save_only_model true \
--dataset codefuse-evol-instruction-zh \
--freeze_parameters 0.25 \
--freeze_parameters_ratio 0.25 \
--additional_trainable_parameters transformer.wte \
--preprocess_num_proc 4 \
3 changes: 1 addition & 2 deletions swift/llm/tuner.py
Original file line number Diff line number Diff line change
Expand Up @@ -256,8 +256,7 @@ def prepare_model(model, args: SftArguments):
model.train()
model.requires_grad_(True)

if args.freeze_parameters > 0:
freeze_model_parameters(model, args.freeze_parameters)
freeze_model_parameters(model, args.freeze_parameters_ratio, args.freeze_parameters)
if len(args.additional_trainable_parameters) > 0:
activate_model_parameters(model, args.additional_trainable_parameters)
if use_torchacc() and args.resume_from_checkpoint is not None:
Expand Down
34 changes: 29 additions & 5 deletions swift/llm/utils/argument.py
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,20 @@ def handle_compatibility(self: Union['SftArguments', 'InferArguments']) -> None:
if self.server_port is not None:
self.port = self.server_port
if isinstance(self, SftArguments):
log_freeze_warning = False
try:
if isinstance(self.freeze_parameters, (int, float)):
log_freeze_warning = True
elif isinstance(self.freeze_parameters, list) and len(self.freeze_parameters) == 1:
self.freeze_parameters = float(self.freeze_parameters[0])
log_freeze_warning = True
except Exception:
pass
if log_freeze_warning:
logger.warning(f'please use `--freeze_parameters_ratio {self.freeze_parameters}`')
self.freeze_parameters_ratio = self.freeze_parameters
self.freeze_parameters = []

if isinstance(self.train_dataset_mix_ds, str):
self.train_dataset_mix_ds = [self.train_dataset_mix_ds]
if self.only_save_model is not None:
Expand Down Expand Up @@ -585,7 +599,9 @@ class SftArguments(ArgumentsBase):

sft_type: Literal['lora', 'full', 'longlora', 'adalora', 'ia3', 'llamapro', 'adapter', 'vera', 'boft', 'fourierft',
'reft'] = 'lora'
freeze_parameters: float = 0. # 0 ~ 1
freeze_parameters: List[str] = field(default_factory=list)
freeze_vit: bool = False
freeze_parameters_ratio: float = 0. # 0 ~ 1
additional_trainable_parameters: List[str] = field(default_factory=list)
tuner_backend: Literal['swift', 'peft', 'unsloth'] = 'peft'
template_type: str = field(
Expand Down Expand Up @@ -1001,9 +1017,10 @@ def __post_init__(self) -> None:
logger.warning('Currently, only full parameter is supported. Setting args.sft_type: "full"')
self.sft_type = 'full'

model_info = MODEL_MAPPING[self.model_type]
if is_adapter(self.sft_type):
assert self.freeze_parameters == 0., (
'lora does not support `freeze_parameters`, please set `--sft_type full`')
assert self.freeze_parameters_ratio == 0., (
'lora does not support `freeze_parameters_ratio`, please set `--sft_type full`')
assert len(self.additional_trainable_parameters) == 0, (
'lora does not support `additional_trainable_parameters`, please set `--sft_type full`')
if is_quant_model(self.model_type):
Expand All @@ -1014,7 +1031,15 @@ def __post_init__(self) -> None:
if self.eval_steps is None:
self.eval_steps = 50
elif self.sft_type == 'full':
assert 0 <= self.freeze_parameters <= 1
if self.freeze_vit:
from swift.utils.module_mapping import MODEL_KEYS_MAPPING
lora_target_modules = model_info.get('lora_target_modules')
vision_tower = None
if isinstance(lora_target_modules, str):
vision_tower = MODEL_KEYS_MAPPING[lora_target_modules].vision_tower
if vision_tower is not None:
self.freeze_parameters.append(vision_tower)
assert 0 <= self.freeze_parameters_ratio <= 1
assert self.quantization_bit == 0, 'Full parameter fine-tuning does not support quantization.'
assert self.dtype != 'fp16', ("Fine-tuning with dtype=='fp16' can lead to NaN issues. "
'Please use fp32+AMP or bf16 to perform full parameter fine-tuning.')
Expand Down Expand Up @@ -1098,7 +1123,6 @@ def __post_init__(self) -> None:
logger.info(f'Setting args.dataloader_pin_memory: {self.dataloader_pin_memory}')
if 'qwen-audio' in self.model_type:
assert self.preprocess_num_proc == 1 or self.lazy_tokenize, 'not support'
model_info = MODEL_MAPPING[self.model_type]
support_gradient_checkpointing = model_info.get('support_gradient_checkpointing', True)
if self.gradient_checkpointing is None:
self.gradient_checkpointing = support_gradient_checkpointing
Expand Down
35 changes: 19 additions & 16 deletions swift/llm/utils/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -543,22 +543,22 @@ def get_model_name_list(cls) -> List[str]:

class LoRATM(NamedTuple):
# default lora target modules for multi-modals
qwen_audio = f'{get_regex_for_mm_default_lora("qwen_audio")}'
qwen_vl = f'{get_regex_for_mm_default_lora("qwen_vl")}'
qwen2_audio = f'{get_regex_for_mm_default_lora("qwen2_audio")}'
qwen2_vl = f'{get_regex_for_mm_default_lora("qwen2_vl")}'
glm4v = f'{get_regex_for_mm_default_lora("glm4v")}'
llava_next_video = f'{get_regex_for_mm_default_lora("llava_next_video")}'
llava_llama = f'{get_regex_for_mm_default_lora("llava_llama")}'
llava = f'{get_regex_for_mm_default_lora("llava")}'
qwen_audio = 'qwen_audio'
qwen_vl = 'qwen_vl'
qwen2_audio = 'qwen2_audio'
qwen2_vl = 'qwen2_vl'
glm4v = 'glm4v'
llava_next_video = 'llava_next_video'
llava_llama = 'llava_llama'
llava = 'llava'
internlm_xcomposer = ['attention.wqkv', 'attention.wo', 'feed_forward.w1', 'feed_forward.w2', 'feed_forward.w3']
internvl = f'{get_regex_for_mm_default_lora("internvl")}'
deepseek_vl = f'{get_regex_for_mm_default_lora("deepseek_vl")}'
minicpm_v = f'{get_regex_for_mm_default_lora("minicpm_v")}'
phi3v = f'{get_regex_for_mm_default_lora("phi3v")}'
cogvlm = f'{get_regex_for_mm_default_lora("cogvlm")}'
florence = f'{get_regex_for_mm_default_lora("florence")}'
idefics3 = f'{get_regex_for_mm_default_lora("idefics3")}'
internvl = 'internvl'
deepseek_vl = 'deepseek_vl'
minicpm_v = 'minicpm_v'
phi3v = 'phi3v'
cogvlm = 'cogvlm'
florence = 'florence'
idefics3 = 'idefics3'
# default lora target modules for nlp llms.
baichuan = ['W_pack']
chatglm = ['query_key_value']
Expand Down Expand Up @@ -6530,4 +6530,7 @@ def get_default_template_type(model_type: str) -> Optional[str]:


def get_default_lora_target_modules(model_type: str) -> Optional[List[str]]:
return MODEL_MAPPING[model_type].get('lora_target_modules')
res = MODEL_MAPPING[model_type].get('lora_target_modules')
if isinstance(res, str):
res = get_regex_for_mm_default_lora(res)
return res
2 changes: 1 addition & 1 deletion swift/utils/module_mapping.py
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ class MultiModelKeys(ModelKeys):
QWEN2_VL_KEYS = MultiModelKeys(
language_model='model',
projector=None,
vision_tower='vision',
vision_tower='visual',
)

GLM4V_KEYS = MultiModelKeys(
Expand Down
Loading

0 comments on commit 9f3f65d

Please sign in to comment.