Skip to content

Commit

Permalink
support npu & deepspeed (modelscope#743)
Browse files Browse the repository at this point in the history
  • Loading branch information
Jintao-Huang committed Apr 19, 2024
1 parent 736571d commit 376fc90
Show file tree
Hide file tree
Showing 12 changed files with 283 additions and 25 deletions.
20 changes: 12 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ To facilitate use by users unfamiliar with deep learning, we provide a Gradio we
Additionally, we are expanding capabilities for other modalities. Currently, we support full-parameter training and LoRA training for AnimateDiff.

## 🎉 News
- 2024.04.19: Support for single-card, DDP, ZeRO2, and ZeRO3 training and inference with NPU, please refer to [NPU Inference and Fine-tuning Best Practices](docs/source/LLM/NPU Inference and Fine-tuning Best Practices.md).
- 2024.04.19: Support for inference, fine-tuning, and deployment of **Llama3** series models. This includes: Llama-3-8B, Llama-3-8B-Instruct, Llama-3-70B, and Llama-3-70B-Instruct. use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/llama3_8b_instruct/lora/sft.sh) to train.
- 2024.04.18: Supported models: wizardlm2-7b-awq, wizardlm2-8x22b, yi-6b-chat-awq, yi-6b-chat-int8, yi-34b-chat-awq, yi-34b-chat-int8. Supported `--deepspeed zero3-offload` and provided default zero3-offload configuration file for zero3+cpu offload usage.
- 2024.04.18: Supported compatibility with HuggingFace ecosystem using the environment variable `USE_HF`, switching to use models and datasets from HF. Please refer to the [HuggingFace ecosystem compatibility documentation](https://github.com/modelscope/swift/tree/main/docs/source_en/LLM/Compat-HF.md).
Expand All @@ -60,6 +61,8 @@ Additionally, we are expanding capabilities for other modalities. Currently, we
- 🔥2024.03.29: Support the fine-tuning and inference of **Grok-1** 300B MoE, please view details [here](https://github.com/modelscope/swift/tree/main/docs/source_en/LLM/Grok-1-best-practice.md).
- 🔥2024.03.25: Supports inference and fine-tuning of TeleChat-7b and TeleChat-12b model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh) to start training!
- 🔥2024.03.20: Supports inference and fine-tuning for the **llava** series. For best practice, you can refer to [here](https://github.com/modelscope/swift/tree/main/docs/source_en/Multi-Modal/llava-best-practice.md).
<details><summary>More</summary>

- 🔥2024.03.12: Support inference and fine-tuning for **deepseek-vl** series. Best practices can be found [here](docs/source_en/Multi-Modal/deepseek-vl-best-practice.md).
- 🔥2024.03.11: Support [GaLore](https://arxiv.org/abs/2403.03507) for effectively reducing memory usage to 1/2 of the original in full-parameter training.
- 🔥2024.03.10: [End-to-end best practices](docs/source_en/LLM/Qwen1.5-best-practice.md) from fine-tuning to deployment for Qwen1.5-7B-Chat and Qwen1.5-72B-Chat.
Expand All @@ -69,8 +72,6 @@ Additionally, we are expanding capabilities for other modalities. Currently, we
- 🔥2024.02.29: Support [LLaMA PRO](https://arxiv.org/pdf/2401.02415.pdf), simply use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/yi_6b_chat/llamapro/sft.sh) to start training.
- 🔥2024.02.29: Support [LoRA+](https://arxiv.org/pdf/2402.12354.pdf), simply use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/yi_6b_chat/lorap/sft.sh) to start training.
- 2024.02.25: Support `swift export` to quantize models using **AWQ/GPTQ** and push to ModelScope Hub. See documentation: [LLM Quantization](docs/source_en/LLM/LLM-quantization.md).
<details><summary>More</summary>

- 2024.02.22: Support gemma series: gemma-2b, [gemma-2b-instruct](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/gemma_2b_instruct), gemma-7b, gemma-7b-instruct.
- 2024.02.16: Support deepseek-math series: deepseek-math-7b, deepseek-math-7b-instruct, deepseek-math-7b-chat.
- 🔥2024.02.05: Support **Qwen1.5** series models, see [model list](https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E6%94%AF%E6%8C%81%E7%9A%84%E6%A8%A1%E5%9E%8B%E5%92%8C%E6%95%B0%E6%8D%AE%E9%9B%86.md#%E6%A8%A1%E5%9E%8B) for all supported Qwen1.5 models. Provide fine-tuning scripts for [qwen1half-7b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen1half_7b_chat), [qwen1half-7b-chat-int8](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen1half_7b_chat_int8).
Expand Down Expand Up @@ -519,8 +520,9 @@ make docs
| ------------------------------------------------------------ |
| [Using Web-UI](docs/source_en/GetStarted/Web-ui.md) |
| [Using Tuners](docs/source_en/GetStarted/Tuners.md) |
| [LLM Fine-tuning](docs/source_en/LLM/LLM-fine-tuning.md) |
| [LLM Inference](docs/source_en/LLM/LLM-inference.md) |
| [LLM Fine-tuning](docs/source_en/LLM/LLM-fine-tuning.md) |
| [LLM Evaluation](docs/source_en/LLM/LLM-eval.md) |
| [LLM Quantization](docs/source_en/LLM/LLM-quantization.md) |
| [LLM Deployment](docs/source_en/LLM/VLLM-inference-acceleration-and-deployment.md) |
| [DPO Human Alignment Training](docs/source_en/LLM/RLHF.md) |
Expand All @@ -532,17 +534,19 @@ make docs
| [Command Line Arguments](docs/source_en/LLM/Command-line-parameters.md) |
| [Customizing New Models and Datasets](docs/source_en/LLM/Customization.md) |
| [Supported Models and Datasets List](docs/source_en/LLM/Supported-models-datasets.md) |
| [Runtime Speed and Memory Benchmark](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Benchmark.md) |
| [Runtime Speed and Memory Benchmark](docs/source_en/LLM/Benchmark.md) |


### Best Practices

| Best Practices Name |
| ------------------------------------------------------------ |
| [Agent Fine-Tuning Best Practice](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Agent%E5%BE%AE%E8%B0%83%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |
| [Self-Cognition Fine-Tuning Best Practice](https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E8%87%AA%E6%88%91%E8%AE%A4%E7%9F%A5%E5%BE%AE%E8%B0%83%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |
| [Qwen1.5 Best Practice](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Qwen1.5%E5%85%A8%E6%B5%81%E7%A8%8B%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |
| [Multi-Modal Model Training Best Practice](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/index.md) |
| [Agent Fine-Tuning Best Practice](docs/source_en/LLM/Agent-best-practice.md) |
| [Self-Cognition Fine-Tuning Best Practice](docs/source_en/LLM/Self-cognition-best-practice.md) |
| [Qwen1.5 Best Practice](docs/source_en/LLM/Qwen1.5-best-practice.md) |
| [Multi-Modal Model Training Best Practice](docs/source_en/Multi-Modal/index.md) |
| [NPU Best Practice](docs/source_en/LLM/NPU-best-practice.md) |


### Deep Learning Tutorials

Expand Down
11 changes: 8 additions & 3 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ SWIFT支持近**200种LLM和MLLM**(多模态大模型)的训练、推理、
此外,我们也在拓展其他模态的能力,目前我们支持了AnimateDiff的全参数训练和LoRA训练。

## 🎉 新闻
- 2024.04.19: 支持NPU的单卡、DDP、ZeRO2和ZeRO3的训练与推理, 可以查看[NPU推理与微调最佳实践](docs/source/LLM/NPU推理与微调最佳实践.md).
- 2024.04.19: 支持**Llama3**系列模型的推理, 微调和部署等. 包括: Llama-3-8B, Llama-3-8B-Instruct, Llama-3-70B, Llama-3-70B-Instruct. 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/llama3_8b_instruct/lora/sft.sh)开始训练叭!
- 2024.04.18: 支持模型: wizardlm2-7b-awq, wizardlm2-8x22b, yi-6b-chat-awq, yi-6b-chat-int8, yi-34b-chat-awq, yi-34b-chat-int8. 支持`--deepspeed zero3-offload`, 提供了默认zero3-offload配置文件来使用zero3+cpu offload.
- 2024.04.18: 支持使用环境变量`USE_HF`兼容HuggingFace生态, 切换成使用HF中的模型和数据集, 可以查看[HuggingFace生态兼容文档](https://github.com/modelscope/swift/tree/main/docs/source/LLM/HuggingFace生态兼容.md).
Expand All @@ -61,6 +62,8 @@ SWIFT支持近**200种LLM和MLLM**(多模态大模型)的训练、推理、
- 🔥2024.03.29: 支持**Grok-1** 300B MoE模型的推理与微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/Grok训练和推理.md).
- 🔥2024.03.25: 支持TeleChat-7b和TeleChat-12b模型的训练和推理, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh)来开始训练!.
- 🔥2024.03.20: 支持**llava**系列的推理与微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/llava最佳实践.md).
<details><summary>更多</summary>

- 🔥2024.03.12: 支持**deepseek-vl**系列推理和微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/deepseek-vl最佳实践.md).
- 🔥2024.03.11: 支持[GaLore](https://arxiv.org/abs/2403.03507), 用于在全参数训练中有效减小显存占用至原来的1/2.
- 🔥2024.03.10: Qwen1.5-7B-Chat与Qwen1.5-72B-Chat从微调到部署[全流程最佳实践](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Qwen1.5%E5%85%A8%E6%B5%81%E7%A8%8B%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md).
Expand All @@ -70,8 +73,6 @@ SWIFT支持近**200种LLM和MLLM**(多模态大模型)的训练、推理、
- 🔥2024.02.29: 支持[LLaMA PRO](https://arxiv.org/pdf/2401.02415.pdf), 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/yi_6b_chat/llamapro/sft.sh)即可开始训练.
- 🔥2024.02.29: 支持[LoRA+](https://arxiv.org/pdf/2402.12354.pdf), 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/yi_6b_chat/lorap/sft.sh)即可开始训练.
- 2024.02.25: 支持`swift export`, 对模型进行**AWQ/GPTQ**量化导出, 以及推送ModelScope Hub. 具体可以查看文档: [LLM量化文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E9%87%8F%E5%8C%96%E6%96%87%E6%A1%A3.md).
<details><summary>更多</summary>

- 2024.02.22: 支持gemma系列: gemma-2b, [gemma-2b-instruct](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/gemma_2b_instruct), gemma-7b, gemma-7b-instruct.
- 2024.02.16: 支持deepseek-math系列: deepseek-math-7b, deepseek-math-7b-instruct, deepseek-math-7b-chat.
- 🔥2024.02.05: 支持**Qwen1.5**系列模型, 支持的所有Qwen1.5系列模型请查看[模型列表](https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E6%94%AF%E6%8C%81%E7%9A%84%E6%A8%A1%E5%9E%8B%E5%92%8C%E6%95%B0%E6%8D%AE%E9%9B%86.md#%E6%A8%A1%E5%9E%8B). 提供了[qwen1half-7b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen1half_7b_chat), [qwen1half-7b-chat-int8](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen1half_7b_chat_int8)微调的脚本.
Expand Down Expand Up @@ -518,8 +519,9 @@ make docs
| ------------------------------------------------------------ |
| [使用Web-UI](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md) |
| [使用Tuners](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E4%BD%BF%E7%94%A8tuners.md) |
| [LLM微调](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E5%BE%AE%E8%B0%83%E6%96%87%E6%A1%A3.md) |
| [LLM推理](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E6%8E%A8%E7%90%86%E6%96%87%E6%A1%A3.md) |
| [LLM微调](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E5%BE%AE%E8%B0%83%E6%96%87%E6%A1%A3.md) |
| [LLM评测](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E8%AF%84%E6%B5%8B%E6%96%87%E6%A1%A3.md) |
| [LLM量化](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E9%87%8F%E5%8C%96%E6%96%87%E6%A1%A3.md) |
| [LLM部署](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM%E6%8E%A8%E7%90%86%E5%8A%A0%E9%80%9F%E4%B8%8E%E9%83%A8%E7%BD%B2.md) |
| [DPO人类对齐训练](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E4%BA%BA%E7%B1%BB%E5%AF%B9%E9%BD%90%E8%AE%AD%E7%BB%83%E6%96%87%E6%A1%A3.md) |
Expand All @@ -533,6 +535,7 @@ make docs
| [自定义新模型和数据集](https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E8%87%AA%E5%AE%9A%E4%B9%89%E4%B8%8E%E6%8B%93%E5%B1%95.md) |
| [支持的模型和数据集列表](https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E6%94%AF%E6%8C%81%E7%9A%84%E6%A8%A1%E5%9E%8B%E5%92%8C%E6%95%B0%E6%8D%AE%E9%9B%86.md) |
| [运行速度与显存Benchmark](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Benchmark.md) |
| [HuggingFace生态兼容](https://github.com/modelscope/swift/blob/main/docs/source/LLM/HuggingFace%E7%94%9F%E6%80%81%E5%85%BC%E5%AE%B9.md) |


### 最佳实践
Expand All @@ -542,6 +545,8 @@ make docs
| [自我认知微调最佳实践](https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E8%87%AA%E6%88%91%E8%AE%A4%E7%9F%A5%E5%BE%AE%E8%B0%83%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |
| [Qwen1.5最佳实践](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Qwen1.5%E5%85%A8%E6%B5%81%E7%A8%8B%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |
| [多模态模型训练最佳实践](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/index.md) |
| [NPU推理与微调最佳实践](https://github.com/modelscope/swift/blob/main/docs/source/LLM/NPU%E6%8E%A8%E7%90%86%E4%B8%8E%E5%BE%AE%E8%B0%83%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |


### 深度学习教程

Expand Down
111 changes: 111 additions & 0 deletions docs/source/LLM/NPU推理与微调最佳实践.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# NPU训练最佳实践

## 目录
- [环境准备](#环境准备)
- [微调](#微调)
- [推理](#推理)


## 环境准备

实验环境:8 * 昇腾910B3

```shell
pip install ms-swift -U
pip install torch-npu
```

测试环境是否安装正确:
```python
from transformers.utils import is_torch_npu_available
import torch

print(is_torch_npu_available()) # True
print(torch.npu.device_count()) # 8
```

## 微调
以下介绍LoRA的微调, 全参数微调设置参数`--sft_type full`即可.


### 单卡训练

通过如下命令启动单卡微调:

```shell
# 实验环境: 昇腾910B3
# 显存需求: 25GB
# 运行时长: 8小时
ASCEND_RT_VISIBLE_DEVICES=0 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset blossom-math-zh \
--num_train_epochs 5 \
--sft_type lora \
--output_dir output \
```


### 数据并行训练

```shell
# 实验环境: 4 * 昇腾910B3
# 显存需求: 4 * 30GB
# 运行时长: 2小时
NPROC_PER_NODE=4 \
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset blossom-math-zh \
--num_train_epochs 5 \
--sft_type lora \
--output_dir output \
```


### Deepspeed训练

ZeRO2:
```shell
# 实验环境: 4 * 昇腾910B3
# 显存需求: 4 * 28GB
# 运行时长: 3小时
NPROC_PER_NODE=4 \
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset blossom-math-zh \
--num_train_epochs 5 \
--sft_type lora \
--output_dir output \
--deepspeed default-zero2 \
```

ZeRO3:
```shell
# 实验环境: 4 * 昇腾910B3
# 显存需求: 4 * 25GB
# 运行时长: 8小时
NPROC_PER_NODE=4 \
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset blossom-math-zh \
--num_train_epochs 5 \
--sft_type lora \
--output_dir output \
--deepspeed default-zero3 \
```


## 推理

原始模型:
```shell
ASCEND_RT_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-7b-chat
```

LoRA微调后:
```shell
ASCEND_RT_VISIBLE_DEVICES=0 swift infer --ckpt_dir xxx/checkpoint-xxx --load_dataset_config true
```
9 changes: 7 additions & 2 deletions docs/source/LLM/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
1. [自我认知微调最佳实践](自我认知微调最佳实践.md)
2. [Agent训练与通用数据混合最佳实践](Agent微调最佳实践.md)
3. [Qwen1.5全流程最佳实践](Qwen1.5全流程最佳实践.md)
4. [NPU推理与微调最佳实践](NPU推理与微调最佳实践.md)
5. [Grok-1训练和推理最佳实践](Grok训练和推理.md)


### 🍀Multi-Modal最佳实践系列
Expand All @@ -17,8 +19,11 @@
2. [LLM微调文档](LLM微调文档.md)
3. [DPO训练文档](LLM人类对齐训练文档.md)
4. [界面训练与推理](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md)
5. [LLM量化文档](LLM量化文档.md)
6. [VLLM推理加速与部署](VLLM推理加速与部署.md)
5. [LLM评测文档](LLM评测文档.md)
6. [LLM量化文档](LLM量化文档.md)
7. [VLLM推理加速与部署](VLLM推理加速与部署.md)
8. [LLM实验文档](LLM实验文档.md)


### 🐔参考文档
1. [自定义模型和数据集](自定义与拓展.md)
Expand Down
110 changes: 110 additions & 0 deletions docs/source_en/LLM/NPU-best-practice.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# NPU Best Practice

## Table of Contents
- [Environment Preparation](#Environment-Preparation)
- [Fine-tuning](#Fine-tuning)
- [Inference](#Inference)

## Environment Preparation

Experimental environment: 8 * Ascend 910B3

```shell
pip install ms-swift -U
pip install torch-npu
```

Verify the installation of the testing environment:
```python
from transformers.utils import is_torch_npu_available
import torch

print(is_torch_npu_available()) # True
print(torch.npu.device_count()) # 8
```

## Fine-tuning
The following introduces the fine-tuning of LoRA. Set the parameter `--sft_type full` for full parameter fine-tuning.


### Single Card Training

Start single card fine-tuning with the following command:

```shell
# Experimental Environment: Ascend 910B3
# GPU Memory Requirement: 25GB
# Runtime: 8 hours
ASCEND_RT_VISIBLE_DEVICES=0 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset blossom-math-zh \
--num_train_epochs 5 \
--sft_type lora \
--output_dir output \
```


### Training with DDP

```shell
# Experimental Environment: 4 * Ascend 910B3
# GPU Memory Requirement: 4 * 30GB
# Runtime: 2 hours
NPROC_PER_NODE=4 \
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset blossom-math-zh \
--num_train_epochs 5 \
--sft_type lora \
--output_dir output \
```


### Training with DeepSpeed

ZeRO2:
```shell
# Experimental Environment: 4 * Ascend 910B3
# GPU Memory Requirement: 4 * 28GB
# Runtime: 3 hours
NPROC_PER_NODE=4 \
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset blossom-math-zh \
--num_train_epochs 5 \
--sft_type lora \
--output_dir output \
--deepspeed default-zero2 \
```

ZeRO3:
```shell
# Experimental Environment: 4 * Ascend 910B3
# GPU Memory Requirement: 4 * 25GB
# Runtime: 8 hours
NPROC_PER_NODE=4 \
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
--model_type qwen1half-7b-chat \
--dataset blossom-math-zh \
--num_train_epochs 5 \
--sft_type lora \
--output_dir output \
--deepspeed default-zero3 \
```


## Inference

Original Model:
```shell
ASCEND_RT_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-7b-chat
```

After LoRA Fine-tuning:
```shell
ASCEND_RT_VISIBLE_DEVICES=0 swift infer --ckpt_dir xxx/checkpoint-xxx --load_dataset_config true
```
9 changes: 7 additions & 2 deletions docs/source_en/LLM/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
1. [Self Cognition Best Practice](Self-cognition-best-practice.md)
2. [Agent Training and Inference Best Practice](Agent-best-practice.md)
3. [Qwen1.5 Best Practice](Qwen1.5-best-practice.md)
4. [NPU Best Practice](NPU-best-practice.md)
5. [Grok-1 Training and Inference Best Practice](Grok-1-best-practice.md)


### 🍀Multi-Modal Best Practices!
Expand All @@ -18,8 +20,11 @@ Please check: [Multi-Modal Best Practices](../Multi-Modal/index.md)
2. [LLM Finetuning](LLM-fine-tuning.md)
3. [DPO Training](RLHF.md)
4. [Web-ui Training and Inference](../GetStarted/Web-ui.md)
5. [LLM quantization](LLM-quantization.md)
6. [VLLM Inference and Deployment](VLLM-inference-acceleration-and-deployment.md)
5. [LLM Evaluation](LLM-eval.md)
6. [LLM Quantization](LLM-quantization.md)
7. [VLLM Inference and Deployment](VLLM-inference-acceleration-and-deployment.md)
8. [LLM Experimental](LLM-exp.md)


### 🐔References!
1. [Customization for models and datasets](Customization.md)
Expand Down
Loading

0 comments on commit 376fc90

Please sign in to comment.