Peft 0.12 (modelscope#1586)

eric-doug · Aug 2, 2024 · 8d9c062 · 8d9c062
1 parent fe7b847
commit 8d9c062
Show file tree

Hide file tree

Showing 10 changed files with 161 additions and 196 deletions.
diff --git a/README.md b/README.md
@@ -55,6 +55,7 @@ You can contact us and communicate with us by adding our group:
 <img src="asset/discord_qr.jpg" width="200" height="200">  |  <img src="asset/wechat.png" width="200" height="200">
 
 ## 🎉 News
+- 🔥2024.08.02: Support Fourier Ft. Use `--sft_type fourierft` to begin, Check parameter documentation [here](https://swift.readthedocs.io/en/latest/LLM/Command-line-parameters.html).
 - 🔥2024.07.29: Support the use of lmdeploy for inference acceleration of LLM and VLM models. Documentation can be found [here](docs/source_en/Multi-Modal/LmDeploy-inference-acceleration.md).
 - 🔥2024.07.24: Support DPO/ORPO/SimPO/CPO alignment algorithm for vision MLLM, training scripts can be find in [Document](docs/source_en/Multi-Modal/human-preference-alignment-training-documentation.md). support RLAIF-V dataset.
 - 🔥2024.07.24: Support using Megatron for CPT and SFT on the Qwen2 series. You can refer to the [Megatron training documentation](docs/source_en/LLM/Megatron-training.md).

diff --git a/README_CN.md b/README_CN.md
@@ -56,6 +56,7 @@ SWIFT具有丰富全面的文档，请查看我们的文档网站:
 
 
 ## 🎉 新闻
+- 🔥2024.08.02: 支持Fourier Ft训练. 使用方式为`--sft_type fourierft`, 参数可以参考[这里](https://swift.readthedocs.io/zh-cn/latest/LLM/%E5%91%BD%E4%BB%A4%E8%A1%8C%E5%8F%82%E6%95%B0.html).
 - 🔥2024.07.29: 支持使用lmdeploy对LLM和VLM模型进行推理加速. 文档可以查看[这里](docs/source/Multi-Modal/LmDeploy推理加速文档.md).
 - 🔥2024.07.24: 人类偏好对齐算法支持视觉多模态大模型, 包括DPO/ORPO/SimPO/CPO, 训练参考[文档](docs/source/Multi-Modal/人类偏好对齐训练文档.md). 支持数据集RLAIF-V.
 - 🔥2024.07.24: 支持使用megatron对qwen2系列进行CPT和SFT. 可以查看[megatron训练文档](docs/source/LLM/Megatron训练文档.md).

diff --git a/docs/source/LLM/命令行参数.md b/docs/source/LLM/命令行参数.md
@@ -13,15 +13,15 @@
 
 ## sft 参数
 
-- `--model_type`: 表示你选择的模型类型, 默认是`None`. `model_type`指定了对应模型默认的`lora_target_modules`, `template_type`等信息. 你可以通过只指定`model_type`进行微调. 对应的`model_id_or_path`会使用默认的设置, 从ModelScope进行下载, 并使用默认的缓存路径. model_type和model_id_or_path必须指定其中的一个. 可以选择的`model_type`可以查看[支持的模型](支持的模型和数据集.md#模型). 你可以设置`USE_HF`环境变量来控制从HF Hub下载模型和数据集, 参考[HuggingFace生态兼容文档](HuggingFace生态兼容.md).
+- `--model_type`: 表示你选择的模型类型, 默认是`None`. `model_type`指定了对应模型默认的`target_modules`, `template_type`等信息. 你可以通过只指定`model_type`进行微调. 对应的`model_id_or_path`会使用默认的设置, 从ModelScope进行下载, 并使用默认的缓存路径. model_type和model_id_or_path必须指定其中的一个. 可以选择的`model_type`可以查看[支持的模型](支持的模型和数据集.md#模型). 你可以设置`USE_HF`环境变量来控制从HF Hub下载模型和数据集, 参考[HuggingFace生态兼容文档](HuggingFace生态兼容.md).
 - `--model_id_or_path`: 表示模型在ModelScope/HuggingFace Hub中的`model_id`或者本地路径, 默认为`None`. 如果传入的`model_id_or_path`已经被注册, 则会根据`model_id_or_path`推断出`model_type`. 如果未被注册, 则需要同时指定`model_type`, e.g. `--model_type <model_type> --model_id_or_path <model_id_or_path>`.
 - `--model_revision`: 表示模型在ModelScope Hub中对应`model_id`的版本号, 默认为`None`. `model_revision`指定为`None`, 则使用注册在`MODEL_MAPPING`中的revision. 否则强制使用命令行传入的`model_revision`.
 - `--local_repo_path`: 部分模型在加载时依赖于github repo. 为了避免`git clone`时遇到网络问题, 可以直接使用本地repo. 该参数需要传入本地repo的路径, 默认为`None`. 这部分模型包括:
   - mPLUG-Owl模型: `https://github.com/X-PLUG/mPLUG-Owl`
   - DeepSeek-VL模型: `https://github.com/deepseek-ai/DeepSeek-VL`
   - YI-VL模型: `https://github.com/01-ai/Yi`
   - LLAVA模型: `https://github.com/haotian-liu/LLaVA.git`
-- `--sft_type`: 表示微调的方式, 默认是`'lora'`. 你可以选择的值包括: 'lora', 'full', 'longlora', 'adalora', 'ia3', 'llamapro', 'adapter', 'vera', 'boft'. 如果你要使用qlora, 你需设置`--sft_type lora --quantization_bit 4`.
+- `--sft_type`: 表示微调的方式, 默认是`'lora'`. 你可以选择的值包括: 'lora', 'full', 'longlora', 'adalora', 'ia3', 'llamapro', 'adapter', 'vera', 'boft', 'fourierft'. 如果你要使用qlora, 你需设置`--sft_type lora --quantization_bit 4`.
 - `--packing`: pack数据集到`max-length`, 默认值`False`.
 - `--freeze_parameters`: 当sft_type指定为'full'时, 将模型最底部的参数进行freeze. 指定范围为0. ~ 1., 默认为`0.`. 该参数提供了lora与全参数微调的折中方案.
 - `--additional_trainable_parameters`: 作为freeze_parameters的补充, 只有在sft_type指定为'full'才允许被使用, 默认为`[]`. 例如你如果想训练50%的参数的情况下想额外训练embedding层, 你可以设置`--freeze_parameters 0.5 --additional_trainable_parameters transformer.wte`, 所有以`transformer.wte`开头的parameters都会被激活. 你也可以设置`--freeze_parameters 1 --additional_trainable_parameters xxx`来自定义可以训练的层.
@@ -62,14 +62,14 @@
 - `--bnb_4bit_quant_type`: 4bit量化时的量化方式, 默认是`'nf4'`. 可选择的值包括: 'nf4', 'fp4'. 当quantization_bit为0时, 该参数无效.
 - `--bnb_4bit_use_double_quant`: 是否在4bit量化时开启double量化, 默认为`True`. 当quantization_bit为0时, 该参数无效.
 - `--bnb_4bit_quant_storage`: 默认值为`None`. 量化参数的存储类型. 若`quantization_bit`设置为0, 则该参数失效.
-- `--lora_target_modules`: 指定lora模块, 默认为`['DEFAULT']`. 如果lora_target_modules传入`'DEFAULT'` or `'AUTO'`, 则根据`model_type`查找`MODEL_MAPPING`中的`lora_target_modules`(默认指定为qkv). 如果传入`'ALL'`, 则将所有的Linear层(不含head)指定为lora模块. 如果传入`'EMBEDDING'`, 则Embedding层指定为lora模块. 如果内存允许, 建议设置成'ALL'. 当然, 你也可以设置`['ALL', 'EMBEDDING']`, 将所有的Linear和embedding层指定为lora模块. 该参数只有当`sft_type`指定为'lora'时才生效.
-- `--lora_target_regex`: 指定lora模块的regex表达式, `Optional[str]`类型. 默认为`None`, 如果该值传入, 则lora_target_modules不生效.
+- `--target_modules`: 指定lora模块, 默认为`['DEFAULT']`. 如果target_modules传入`'DEFAULT'` or `'AUTO'`, 则根据`model_type`查找`MODEL_MAPPING`中的`target_modules`(默认指定为qkv). 如果传入`'ALL'`, 则将所有的Linear层(不含head)指定为lora模块. 如果传入`'EMBEDDING'`, 则Embedding层指定为lora模块. 如果内存允许, 建议设置成'ALL'. 当然, 你也可以设置`['ALL', 'EMBEDDING']`, 将所有的Linear和embedding层指定为lora模块. 该参数在使用lora/vera/boft/ia3/adalora/fourierft时生效.
+- `--target_regex`: 指定lora模块的regex表达式, `Optional[str]`类型. 默认为`None`, 如果该值传入, 则target_modules不生效.该参数在使用lora/vera/boft/ia3/adalora/fourierft时生效.
 - `--lora_rank`: 默认为`8`. 只有当`sft_type`指定为'lora'时才生效.
 - `--lora_alpha`: 默认为`32`. 只有当`sft_type`指定为'lora'时才生效.
 - `--lora_dropout`: 默认为`0.05`, 只有当`sft_type`指定为'lora'时才生效.
 - `--init_lora_weights`: 初始化LoRA weights的方法, 可以指定为`true`, `false`, `guassian`, `pissa`, `pissa_niter_[number of iters]`, 默认值`true`.
 - `--lora_bias_trainable`: 默认为`'none'`, 可以选择的值: 'none', 'all'. 如果你要将bias全都设置为可训练, 你可以设置为`'all'`.
-- `--lora_modules_to_save`: 默认为`[]`. 如果你想要训练embedding, lm_head, 或者layer_norm, 你可以设置此参数, 例如: `--lora_modules_to_save EMBEDDING LN lm_head`. 如果传入`'EMBEDDING'`, 则将Embedding层添加到`lora_modules_to_save`. 如果传入`'LN'`, 则将`RMSNorm`和`LayerNorm`添加到`lora_modules_to_save`.
+- `--modules_to_save`: 默认为`[]`. 如果你想要训练embedding, lm_head, 或者layer_norm, 你可以设置此参数, 例如: `--modules_to_save EMBEDDING LN lm_head`. 如果传入`'EMBEDDING'`, 则将Embedding层添加到`modules_to_save`. 如果传入`'LN'`, 则将`RMSNorm`和`LayerNorm`添加到`modules_to_save`.该参数在使用lora/vera/boft/ia3/adalora/fourierft时生效.
 - `--lora_dtype`: 默认为`'AUTO'`, 指定lora模块的dtype类型. 如果是`AUTO`则跟随原始模块的dtype类型. 你可以选择的值: 'fp16', 'bf16', 'fp32', 'AUTO'.
 - `--use_dora`: 默认为`False`, 是否使用`DoRA`.
 - `--use_rslora`: 默认为`False`, 是否使用`RS-LoRA`.
@@ -154,22 +154,29 @@
 
 - `--sequence_parallel_size`: 默认值`1`, 大于1时可以拆分一个sequence到多张显卡上以节省显存, 值需要设置为能被DDP数量整除
 
-### BOFT 参数
+### FourierFt 参数
+
+FourierFt使用`target_modules`, `target_regex`, `modules_to_save`三个参数.
+
+- `--fourier_n_frequency`: 傅里叶变换的频率数量, `int`类型, 类似于LoRA中的`r`. 默认值`2000`.
+- `--fourier_scaling`: W矩阵的缩放值, `float`类型, 类似LoRA中的`lora_alpha`. 默认值`300.0`.
+
+### BOFT参数
+
+BOFT使用`target_modules`, `target_regex`, `modules_to_save`三个参数.
 
 - `--boft_block_size`: BOFT块尺寸, 默认值4.
 - `--boft_block_num`: BOFT块数量, 不能和`boft_block_size`同时使用.
-- `--boft_target_modules`: BOFT目标模块. 默认为`['DEFAULT']`. 如果boft_target_modules传入`'DEFAULT'` or `'AUTO'`, 则根据`model_type`查找`MODEL_MAPPING`中的boft_target_modules`(默认指定为qkv). 如果传入`'ALL'`, 则将所有的Linear层(不含head)指定为boft模块.
 - `--boft_dropout`: boft的dropout值, 默认0.0.
-- `--boft_modules_to_save`: 需要额外训练和存储的模块, 默认为`None`.
 
 ### Vera参数
 
+Vera使用`target_modules`, `target_regex`, `modules_to_save`三个参数.
+
 - `--vera_rank`: Vera Attention的尺寸, 默认值256.
 - `--vera_projection_prng_key`: 是否存储Vera映射矩阵, 默认为True.
-- `--vera_target_modules`: Vera目标模块. 默认为`['DEFAULT']`. 如果vera_target_modules传入`'DEFAULT'` or `'AUTO'`, 则根据`model_type`查找`MODEL_MAPPING`中的vera_target_modules`(默认指定为qkv). 如果传入`'ALL'`, 则将所有的Linear层(不含head)指定为vera模块.
 - `--vera_dropout`: Vera的dropout值, 默认`0.0`.
 - `--vera_d_initial`: Vera的d矩阵的初始值, 默认`0.1`.
-- `--vera_modules_to_save`: 需要额外训练和存储的模块, 默认为`None`.
 
 ### LoRA+微调参数
 
@@ -230,17 +237,17 @@ unsloth无新增参数，对已有参数进行调节即可支持：
 
 ### IA3微调参数
 
+IA3使用`target_modules`, `target_regex`, `modules_to_save`三个参数.
+
 以下参数`sft_type`设置为`ia3`时生效.
 
-- `--ia3_target_modules`: 指定IA3目标模块, 默认为`['DEFAULT']`. 具体含义可以参考`lora_target_modules`.
 - `--ia3_feedforward_modules`: 指定IA3的MLP的Linear名称, 该名称必须在`ia3_target_modules`中.
-- `--ia3_modules_to_save`: IA3参与训练的额外模块. 具体含义可以参考`lora_modules_to_save`的含义.
 
 ## PT 参数
 
 PT参数继承了sft参数，并修改了部分默认值.
 - `--sft_type`: 默认值为`'full'`.
-- `--lora_target_modules`: 默认值为`'ALL'`.
+- `--target_modules`: 默认值为`'ALL'`.
 - `--lazy_tokenize`: 默认值为`True`.
 - `--eval_steps`: 默认值为`500`.