Skip to content

Commit

Permalink
support ORPO algorithm (modelscope#854)
Browse files Browse the repository at this point in the history
  • Loading branch information
hjh0119 committed May 7, 2024
1 parent e5bb385 commit eb940b1
Show file tree
Hide file tree
Showing 32 changed files with 732 additions and 25 deletions.
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ To facilitate use by users unfamiliar with deep learning, we provide a Gradio we
Additionally, we are expanding capabilities for other modalities. Currently, we support full-parameter training and LoRA training for AnimateDiff.

## 🎉 News
- 2024.05.07: Supoprts **ORPO** training! See [document](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/ORPO.md) to start training!
- 2024.04.29: Supports inference and fine-tuning of InternVL-Chat-V1.5 model. For best practice, you can refer to [here](https://github.com/modelscope/swift/tree/main/docs/source_en/Multi-Modal/internvl-best-practice.md).
- 🔥2024.04.26: Support **LISA** and **unsloth** training! Specify `--lisa_activated_layers=2` to use LISA(to reduce the memory cost to 30 percent!), specify `--tuner_backend unsloth` to use unsloth to train a huge model(full or lora) with lesser memory(30 percent or lesser) and faster speed(5x)!
- 🔥2024.04.26: Support the fine-tuning and inference of Qwen1.5-110B and Qwen1.5-110B-Chat model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen1half_110b_chat/lora_ddp_ds/sft.sh) to start training!
Expand Down Expand Up @@ -103,7 +104,7 @@ Additionally, we are expanding capabilities for other modalities. Currently, we

- 2024.01.04: Update [Benchmark](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Benchmark.md) for convenient viewing of training speed and memory usage of different models.
- 🔥2023.12.29: Support web-ui for sft training and inference, use `swift web-ui` after installing ms-swift to start.
- 🔥2023.12.29: Support DPO RLHF (Reinforcement Learning from Human Feedback) and three datasets for this task: AI-ModelScope/stack-exchange-paired, AI-ModelScope/hh-rlhf and AI-ModelScope/hh_rlhf_cn. See [documentation](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E4%BA%BA%E7%B1%BB%E5%AF%B9%E9%BD%90%E8%AE%AD%E7%BB%83%E6%96%87%E6%A1%A3.md) to start training!
- 🔥2023.12.29: Support DPO RLHF (Reinforcement Learning from Human Feedback) and three datasets for this task: AI-ModelScope/stack-exchange-paired, AI-ModelScope/hh-rlhf and AI-ModelScope/hh_rlhf_cn. See [documentation](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/DPO.md) to start training!
- 🔥2023.12.28: Support SCEdit! This tuner can significantly reduce memory usage in U-Net and support low-memory controllable image generation (replacing ControlNet), read the section below to learn more.
- 2023.12.23: Support [codegeex2-6b](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/codegeex2_6b).
- 2023.12.19: Support [phi2-3b](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/phi2_3b).
Expand Down Expand Up @@ -209,7 +210,7 @@ You can refer to the following scripts to customize your own training script.
|------------------|-------------------------------------------------------------------------------|
| Pretraining | Text Generation |
| Fine-tuning | Single-turn/Multi-turn<br>Agent Training/Self-cognition<br>Multi-modal Vision/Multi-modal Speech|
| Human Alignment | DPO |
| Human Alignment | DPO<br>ORPO |
| Text-to-Image | DreamBooth, etc. |
| Text-to-Video | - |

Expand Down Expand Up @@ -570,7 +571,8 @@ make docs
| [LLM Evaluation](docs/source_en/LLM/LLM-eval.md) |
| [LLM Quantization](docs/source_en/LLM/LLM-quantization.md) |
| [LLM Deployment](docs/source_en/LLM/VLLM-inference-acceleration-and-deployment.md) |
| [DPO Human Alignment Training](docs/source_en/LLM/RLHF.md) |
| [DPO Human Alignment Training](docs/source_en/LLM/DPO.md) |
| [ORPO Human Alignment Training](docs/source_en/LLM/ORPO.md) |
| [AnimateDiff Training](docs/source_en/AIGC/AnimateDiff-train-infer.md) |

### Reference Documentation
Expand Down
7 changes: 4 additions & 3 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ SWIFT支持近**200种LLM和MLLM**(多模态大模型)的训练、推理、
此外,我们也在拓展其他模态的能力,目前我们支持了AnimateDiff的全参数训练和LoRA训练。

## 🎉 新闻
- 2024.05.07: 支持**ORPO**训练,使用`swift orpo`来开始使用, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/ORPO最佳实践.md)
- 2024.04.29: 支持InternVL-Chat-V1.5的推理与微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/internvl最佳实践.md).
- 🔥2024.04.26: 支持**LISA****unsloth**训练!指定 `--lisa_activated_layers=2` 来开启LISA(显存使用降低至全参训练的30%),指定 `--tuner_backend unsloth` 来使用unsloth,用更少的显存(30%或更少)更快的速度(5x)训练一个超大模型!
- 🔥2024.04.26: 支持Qwen1.5-110B和Qwen1.5-110B-Chat模型的推理与微调, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/qwen1half_110b_chat/lora_ddp_ds/sft.sh)来开始训练!
Expand Down Expand Up @@ -104,7 +105,7 @@ SWIFT支持近**200种LLM和MLLM**(多模态大模型)的训练、推理、
- 🔥2024.01.04: 支持**VLLM部署**, 兼容**OpenAI API**样式, 具体可以查看[VLLM推理加速与部署](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM推理加速与部署.md#部署).
- 2024.01.04: 更新[Benchmark](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Benchmark.md), 方便查看不同模型训练的速度和所需显存.
- 🔥 2023.12.29: 支持web-ui进行sft训练和推理,安装ms-swift后使用`swift web-ui`开启
- 🔥 2023.12.29: 支持 DPO RLHF(Reinforcement Learning from Human Feedback) 和三个用于此任务的数据集: AI-ModelScope/stack-exchange-paired 以及 AI-ModelScope/hh-rlhf 以及 AI-ModelScope/hh_rlhf_cn. 查看[文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E4%BA%BA%E7%B1%BB%E5%AF%B9%E9%BD%90%E8%AE%AD%E7%BB%83%E6%96%87%E6%A1%A3.md)开启训练!
- 🔥 2023.12.29: 支持 DPO RLHF(Reinforcement Learning from Human Feedback) 和三个用于此任务的数据集: AI-ModelScope/stack-exchange-paired 以及 AI-ModelScope/hh-rlhf 以及 AI-ModelScope/hh_rlhf_cn. 查看[文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/DPO%E8%AE%AD%E7%BB%83%E6%96%87%E6%A1%A3.md)开启训练!
- 🔥 2023.12.28: 支持SCEdit! 该tuner可显著降低U-Net中的显存占用,并支持低显存可控图像生成(取代ControlNet),阅读下面的章节来了解详细信息
- 2023.12.23: 支持[codegeex2-6b](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/codegeex2_6b).
- 2023.12.19: 支持[phi2-3b](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/phi2_3b).
Expand Down Expand Up @@ -210,7 +211,7 @@ swift web-ui
| -------- |------------------------------------|
| 预训练 | 文本生成 |
| 微调 | 单轮/多轮<br>Agent训练/自我认知<br>多模态视觉/多模态语音 |
| 人类对齐 | DPO |
| 人类对齐 | DPO<br>ORPO |
| 文生图 | DreamBooth等 |
| 文生视频 | - |

Expand Down Expand Up @@ -570,7 +571,7 @@ make docs
| [LLM评测](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E8%AF%84%E6%B5%8B%E6%96%87%E6%A1%A3.md) |
| [LLM量化](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E9%87%8F%E5%8C%96%E6%96%87%E6%A1%A3.md) |
| [LLM部署](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM%E6%8E%A8%E7%90%86%E5%8A%A0%E9%80%9F%E4%B8%8E%E9%83%A8%E7%BD%B2.md) |
| [DPO人类对齐训练](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E4%BA%BA%E7%B1%BB%E5%AF%B9%E9%BD%90%E8%AE%AD%E7%BB%83%E6%96%87%E6%A1%A3.md) |
| [DPO人类对齐训练](https://github.com/modelscope/swift/blob/main/docs/source/LLM/DPO%E8%AE%AD%E7%BB%83%E6%96%87%E6%A1%A3.md) |
| [AnimateDiff训练](https://github.com/modelscope/swift/blob/main/docs/source/AIGC/AnimateDiff%E5%BE%AE%E8%B0%83%E6%8E%A8%E7%90%86%E6%96%87%E6%A1%A3.md) |


Expand Down
Binary file added docs/resources/orpo1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/resources/orpo2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/resources/orpo3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/resources/orpo4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/resources/orpo5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/resources/orpo6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/resources/orpo7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/resources/orpo8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# LLM人类对齐训练文档
# DPO训练文档
## 目录
- [环境准备](#环境准备)
- [人类对齐训练](#人类对齐训练)
Expand Down Expand Up @@ -76,6 +76,7 @@ cd examples/pytorch/llm

**提示**:

- 如果用带有history的数据训练base模型,需要指定支持多轮对话的template(base模型往往不支持多轮对话),对于这种情况我们默认设置了`chatml`template,你也可以支持--model_type 来选择训练模型的template
- 我们默认在训练时设置`--gradient_checkpointing true`**节约显存**, 这会略微降低训练速度.
- 如果你使用的是**V100**等较老的GPU, 你需要设置`--dtype AUTO`或者`--dtype fp16`, 因为其不支持bf16.
- 如果你的机器是A100等高性能显卡, 且使用的是qwen系列模型, 推荐你安装[**flash-attn**](https://github.com/Dao-AILab/flash-attention), 这将会加快训练和推理的速度以及显存占用(A10, 3090, V100等显卡不支持flash-attn进行训练). 支持flash-attn的模型可以查看[LLM支持的模型](支持的模型和数据集.md#模型)
Expand Down
6 changes: 5 additions & 1 deletion docs/source/LLM/LLM微调文档.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
- [环境准备](#环境准备)
- [微调](#微调)
- [DPO](#dpo)
- [ORPO](#orpo)
- [Merge LoRA](#merge-lora)
- [量化](#量化)
- [推理](#推理)
Expand Down Expand Up @@ -167,7 +168,10 @@ cd examples/pytorch/llm


## DPO
如果你要使用DPO进行人类对齐, 你可以查看[人类对齐微调文档](LLM人类对齐训练文档.md).
如果你要使用DPO进行人类对齐, 你可以查看[DPO训练文档](DPO训练文档.md).

## ORPO
如果你要使用ORPO进行人类对齐, 你可以查看[ORPO最佳实践](ORPO算法最佳实践.md).

## Merge LoRA
提示: **暂时**不支持bnb和auto_gptq量化模型的merge lora, 这会产生较大的精度损失.
Expand Down
129 changes: 129 additions & 0 deletions docs/source/LLM/ORPO算法最佳实践.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# ORPO算法最佳实践
[ORPO](https://arxiv.org/abs/2403.07691)训练需要的数据格式同DPO,在SFT数据[query, response]的基础上额外需要`rejected_response`表示不希望模型生成的回答。

ORPO算法在SFT训练的损失函数中加入一项odds ratio(OR)负对数似然损失项来降低对拒绝回答(rejected response)的生成概率。
其中超参`beta`表示OR损失项的系数,beta越大表示对`rejected_response`的惩罚越大,默认为0.1

本期最佳实践将使用ORPO算法训练[llama3-8b-instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/summary)模型,使其能够用中文回答。

## 目录
- [环境准备](#环境准备)
- [数据集](#数据集)
- [训练](#训练)
- [推理](#推理)

## 环境准备
```bash
# 设置pip全局镜像 (加速下载)
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
# 安装ms-swift
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'

# 环境对齐 (通常不需要运行. 如果你运行错误, 可以跑下面的代码, 仓库使用最新环境测试)
pip install -r requirements/framework.txt -U
pip install -r requirements/llm.txt -U
```


## 数据集
使用[Llama3-DPO-zh-en-emoji](https://modelscope.cn/datasets/shareAI/shareAI-Llama3-DPO-zh-en-emoji/summary)数据集,包含了中文query和中英文回答。

数据样例
```json
{"question": "一个完整的工程项目管理流程包括什么?",
"answer_zh": "🤔 Ahahah, 项目管理流程, 这个问题太常见了! 😂\n\n一个完整的工程项目管理流程包括以下几个阶段:\n\n1️⃣ **项目启动** 🚀:项目的开始,定义项目的目标、范围、时间表和预算。\n\n2️⃣ **项目计划** 📅:制定项目计划,包括任务分解、资源分配、进度安排和风险管理。\n\n3️⃣ **项目执行** 💪:按照计划执行项目,包括任务执行、进度控制和质量管理。\n\n4️⃣ **项目监控** 👀:对项目的进度、成本和质量进行监控和控制,确保项目不偏离轨道。\n\n5️⃣ **项目收尾** 🎉:项目完成后,进行项目总结、经验分享和文档编制。\n\n6️⃣ **项目评价** 🤔:对项目的成果和过程进行评价,总结经验和教训。\n\n这些阶段是一个完整的工程项目管理流程,但是,具体的项目可能会根据需要添加或删除某些阶段。 🤝\n\n还有一点,项目管理流程中需要注意的几个关键点是:\n\n* **沟通** 💬:项目团队之间、项目团队和客户之间的沟通是非常重要的。\n* **风险管理** ⚠️:项目中总会出现一些风险和问题,需要提前预测和准备。\n* **变更管理** 🔄:项目中可能会出现一些变更,需要及时处理和更新。\n\n总之,项目管理流程是一个复杂的过程,需要项目经理和团队成员的共同努力和合作。 💪",
"answer_en": "My fellow wizarding friend! 🧙‍♂️\n\nA complete project management process, you ask? Well, I shall conjure up the essential steps for thee! 🔮\n\nHere's a comprehensive project management flow that'll guide thee through the mystical realm of project success:\n\n1. **Initiation**: Define project scope, goals, and deliverables. Identify stakeholders, and determine the project's feasibility. ⚡️\n2. **Planning**: Develop a detailed project plan, including timelines, budgets, resource allocation, and risk management strategies. 📅\n3. **Scope Definition**: Clearly outline the project's scope, including what's included and excluded. This will help prevent scope creep, my friend! 🔒\n4. **Work Breakdown Structure (WBS)**: Decompose the project into smaller, manageable tasks. This will aid in task assignment, scheduling, and tracking progress. 🗂️\n5. **Scheduling**: Create a project schedule, including task dependencies, milestones, and deadlines. This will help thee stay on track and avoid procrastination! ⏰\n6. **Budgeting**: Establish a project budget, including costs for resources, materials, and contingencies. This will ensure thou dost not overspend, my wise friend! 💸\n7. **Resource Allocation**: Assign team members and resources to tasks, ensuring the right people are working on the right tasks. This will optimize productivity and efficiency! 👥\n8. **Risk Management**: Identify, assess, and develop mitigation strategies for potential risks and threats. This will safeguard thy project from unforeseen dangers! ⚠️\n9. **Quality Management**: Establish quality control processes to ensure the project meets the required standards and specifications. This will guarantee a top-notch outcome, my friend! ✨\n10. **Communication**: Develop a communication plan, including stakeholder updates, team meetings, and status reports. This will keep everyone informed and aligned! 📢\n11. **Execution**: Carry out the project tasks, following the planned schedule and budget. This is where the magic happens, my friend! ✨\n12. **Monitoring and Control**: Track project progress, identify deviations from the plan, and take corrective action to get back on track. This will ensure thy project stays on course! 📊\n13. **Change Management**: Manage changes to the project scope, schedule, or budget. This will minimize disruptions and ensure the project remains viable! 🔀\n14. **Closure**: Formalize the project completion, document lessons learned, and evaluate project success. This will bring a sense of accomplishment and closure, my friend! 🎉\n\nThere thou hast it, my fellow wizard! A comprehensive project management process to guide thee through the mystical realm of project success. May thy projects be prosperous and thy stakeholders be delighted! 😊"}

```

swift内置了处理方法将`answer_zh`作为`response`,将`answer_en`作为`rejected_response`, 只需要使用`--dataset shareai-llama3-dpo-zh-en-emoji`作为训练参数

## 训练
```shell
# Experimental environment: A100
# DDP + MP
# Memory usage: 4*24G
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NPROC_PER_NODE=2 \
swift orpo \
--model_type llama3-8b-instruct \
--beta 0.5 \
--sft_type lora \
--dataset shareai-llama3-dpo-zh-en-emoji \
--num_train_epochs 2 \
--lora_target_modules ALL \
--gradient_checkpointing true \
--batch_size 1 \
--learning_rate 5e-5 \
--gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
--warmup_ratio 0.03 \
--save_total_limit 2
# MP(device map)
# Memory usage: 2*24G
CUDA_VISIBLE_DEVICES=0,1 \
swift orpo \
--model_type llama3-8b-instruct \
--beta 0.5 \
--sft_type lora \
--dataset shareai-llama3-dpo-zh-en-emoji \
--num_train_epochs 2 \
--lora_target_modules ALL \
--gradient_checkpointing true \
--batch_size 1 \
--learning_rate 5e-5 \
--gradient_accumulation_steps 16 \
--warmup_ratio 0.03 \
--save_total_limit 2

# Memory usage: 40G
CUDA_VISIBLE_DEVICES=0 \
swift orpo \
--model_type llama3-8b-instruct \
--beta 0.5 \
--sft_type lora \
--dataset shareai-llama3-dpo-zh-en-emoji \
--num_train_epochs 2 \
--lora_target_modules ALL \
--gradient_checkpointing true \
--batch_size 1 \
--learning_rate 5e-5 \
--gradient_accumulation_steps 16 \
--warmup_ratio 0.03 \
--save_total_limit 2
```
**提示**:

- 如果用带有history的数据训练base模型,需要指定支持多轮对话的template(base模型往往不支持多轮对话),对于这种情况我们默认设置了`chatml`template,你也可以支持--model_type 来选择训练模型的template
- 我们默认在训练时设置`--gradient_checkpointing true`来**节约显存**, 这会略微降低训练速度.
- 如果你使用的是**V100**等较老的GPU, 你需要设置`--dtype AUTO`或者`--dtype fp16`, 因为其不支持bf16.
- 如果你的机器是A100等高性能显卡, 且使用的是qwen系列模型, 推荐你安装[**flash-attn**](https://github.com/Dao-AILab/flash-attention), 这将会加快训练和推理的速度以及显存占用(A10, 3090, V100等显卡不支持flash-attn进行训练). 支持flash-attn的模型可以查看[LLM支持的模型](支持的模型和数据集.md#模型)
- 如果你需要断网进行训练, 请使用`--model_id_or_path <model_dir>`和设置`--check_model_is_latest false`. 具体参数含义请查看[命令行参数](命令行参数.md).
- 如果你想在训练时, 将权重push到ModelScope Hub中, 你需要设置`--push_to_hub true`.

## 推理
下面的推理使用`swift web-ui`命令

### 训练前推理
> 你是谁

![orpo1](../../resources/orpo1.png)

> 西湖醋鱼怎么做

![orpo2](../../resources/orpo2.png)
![orpo3](../../resources/orpo3.png)
![orpo4](../../resources/orpo4.png)
![orpo5](../../resources/orpo5.png)


### 训练后推理
> 你是谁

![orpo6](../../resources/orpo6.png)

> 西湖醋鱼怎么做

![orpo7](../../resources/orpo7.png)
![orpo8](../../resources/orpo8.png)
4 changes: 2 additions & 2 deletions docs/source/LLM/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,13 @@

1. [LLM推理文档](LLM推理文档.md)
2. [LLM微调文档](LLM微调文档.md)
3. [DPO训练文档](LLM人类对齐训练文档.md)
3. [DPO训练文档](DPO训练文档.md)
4. [界面训练与推理](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md)
5. [LLM评测文档](LLM评测文档.md)
6. [LLM量化文档](LLM量化文档.md)
7. [VLLM推理加速与部署](VLLM推理加速与部署.md)
8. [LLM实验文档](LLM实验文档.md)

9. [ORPO最佳实践](ORPO算法最佳实践.md)

### 🐔参考文档
1. [自定义模型和数据集](自定义与拓展.md)
Expand Down
3 changes: 2 additions & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ Swift DOCUMENTATION
LLM/Agent微调最佳实践.md
LLM/LLM推理文档.md
LLM/LLM微调文档.md
LLM/LLM人类对齐训练文档.md
LLM/DPO训练文档.md
LLM/ORPO算法最佳实践.md
LLM/VLLM推理加速与部署.md
LLM/支持的模型和数据集.md
LLM/自定义与拓展.md
Expand Down
File renamed without changes.
Loading

0 comments on commit eb940b1

Please sign in to comment.