Skip to content

Commit

Permalink
Add OLLaMA doc (modelscope#1660)
Browse files Browse the repository at this point in the history
  • Loading branch information
tastelikefeet committed Aug 9, 2024
1 parent 9f39915 commit aca5a7c
Show file tree
Hide file tree
Showing 7 changed files with 330 additions and 16 deletions.
154 changes: 154 additions & 0 deletions docs/source/LLM/OLLAMA导出文档.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# OLLaMA导出文档

SWIFT已经支持了OLLaMA Modelfile的导出能力,该能力合并到了`swift export`命令中。

## 目录

- [环境准备](#环境准备)
- [导出](#导出)
- [需要注意的问题](#需要注意的问题)

## 环境准备

```shell
# 设置pip全局镜像 (加速下载)
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
# 安装ms-swift
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'
```

OLLaMA导出不需要其他模块支持,因为SWIFT仅会导出ModelFile,后续的运行用户可以自行处理。

## 导出

OLLaMA导出命令行如下:

```shell
# model_type
swift export --model_type llama3-8b-instruct --to_ollama true --ollama_output_dir llama3-8b-instruct-ollama
# ckpt_dir,注意lora训练需要增加--merge_lora true
swift export --ckpt_dir /mnt/workspace/yzhao/tastelikefeet/swift/output/qwen-7b-chat/v141-20240331-110833/checkpoint-10942 --to_ollama true --ollama_output_dir qwen-7b-chat-ollama --merge_lora true
```

执行后会打印如下log:
```shell
[INFO:swift] Exporting to ollama:
[INFO:swift] If you have a gguf file, try to pass the file by :--gguf_file /xxx/xxx.gguf, else SWIFT will use the original(merged) model dir
[INFO:swift] Downloading the model from ModelScope Hub, model_id: LLM-Research/Meta-Llama-3-8B-Instruct
[WARNING:modelscope] Authentication has expired, please re-login with modelscope login --token "YOUR_SDK_TOKEN" if you need to access private models or datasets.
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /mnt/workspace/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct
[INFO:swift] Save Modelfile done, you can start ollama by:
[INFO:swift] > ollama serve
[INFO:swift] In another terminal:
[INFO:swift] > ollama create my-custom-model -f /mnt/workspace/yzhao/tastelikefeet/swift/llama3-8b-instruct-ollama/Modelfile
[INFO:swift] > ollama run my-custom-model
[INFO:swift] End time of running main: 2024-08-09 17:17:48.768722
```
提示可以运行,此时打开ModelFile查看:
```text
FROM /mnt/workspace/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct
TEMPLATE """{{ if .System }}<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ else }}<|begin_of_text|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
{{ end }}{{ .Response }}<|eot_id|>"""
PARAMETER stop "<|eot_id|>"
PARAMETER temperature 0.3
PARAMETER top_k 20
PARAMETER top_p 0.7
PARAMETER repeat_penalty 1.0
```
用户可以改动生成的文件,用于后续推理。
### OLLaMA使用
使用上面的文件,需要安装OLLaMA:
```shell
# https://github.com/ollama/ollama
curl -fsSL https://ollama.com/install.sh | sh
```
启动OLLaMA:
```shell
ollama serve
```
在另一个terminal运行:
```shell
ollama create my-custom-model -f /mnt/workspace/yzhao/tastelikefeet/swift/llama3-8b-instruct-ollama/Modelfile
```
执行后会打印如下log:
```text
transferring model data
unpacking model metadata
processing tensors
converting model
creating new layer sha256:37b0404fb276acb2e5b75f848673566ce7048c60280470d96009772594040706
creating new layer sha256:2ecd014a372da71016e575822146f05d89dc8864522fdc88461c1e7f1532ba06
creating new layer sha256:ddc2a243c4ec10db8aed5fbbc5ac82a4f8425cdc4bd3f0c355373a45bc9b6cb0
creating new layer sha256:fc776bf39fa270fa5e2ef7c6782068acd858826e544fce2df19a7a8f74f3f9df
writing manifest
success
```
之后就可以用命令的名字来推理:
```shell
ollama run my-custom-model
```
```shell
>>> who are you?
I'm LLaMA, I'm a large language model trained by a team of researcher at Meta AI. My primary function is to understand and respond to human
input in a helpful and informative way. I'm a type of AI designed to simulate conversation, answer questions, and even generate text based
on a given prompt or topic.
I'm not a human, but rather a computer program designed to mimic human-like conversation. I don't have personal experiences, emotions, or
physical presence, but I'm here to provide information, answer your questions, and engage in conversation to the best of my abilities.

I'm constantly learning and improving my responses based on the interactions I have with users like you, so please bear with me if I make
any mistakes or don't quite understand what you're asking. I'm here to help and provide assistance, so feel free to ask me anything!
```
## 需要注意的问题
1. 部分模型在
```shell
ollama create my-custom-model -f /mnt/workspace/yzhao/tastelikefeet/swift/qwen-7b-chat-ollama/Modelfile
```
的时候会报错:
```shell
Error: Models based on 'QWenLMHeadModel' are not yet supported
```
这是因为ollama的转换并不支持所有类型的模型,此时可以自行进行gguf导出并修改Modelfile的FROM字段:
```shell
# 详细转换步骤可以参考:https://github.com/ggerganov/llama.cpp/blob/master/examples/quantize/README.md
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
# 模型目录可以在`swift export`命令的日志中找到,类似:
# Using model_dir: /mnt/workspace/yzhao/tastelikefeet/swift/output/qwen-7b-chat/v141-20240331-110833/checkpoint-10942-merged
python convert_hf_to_gguf.py /mnt/workspace/yzhao/tastelikefeet/swift/output/qwen-7b-chat/v141-20240331-110833/checkpoint-10942-merged
```
之后重新执行:
```shell
ollama create my-custom-model -f /mnt/workspace/yzhao/tastelikefeet/swift/qwen-7b-chat-ollama/Modelfile
```
17 changes: 9 additions & 8 deletions docs/source/LLM/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,15 @@
4. [界面训练与推理](../GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md)
5. [LLM评测文档](LLM评测文档.md)
6. [LLM量化文档](LLM量化文档.md)
7. [VLLM推理加速与部署](VLLM推理加速与部署.md)
8. [LmDeploy推理加速与部署](LmDeploy推理加速与部署.md)
9. [LLM实验文档](LLM实验文档.md)
10. [DPO训练文档](DPO训练文档.md)
11. [ORPO最佳实践](ORPO算法最佳实践.md)
12. [SimPO最佳实践](SimPO算法最佳实践.md)
13. [人类偏好对齐训练文档](人类偏好对齐训练文档.md)
14. [Megatron训练文档](Megatron训练文档.md)
7. [OLLAMA导出文档](OLLAMA导出文档.md)
8. [VLLM推理加速与部署](VLLM推理加速与部署.md)
9. [LmDeploy推理加速与部署](LmDeploy推理加速与部署.md)
10. [LLM实验文档](LLM实验文档.md)
11. [DPO训练文档](DPO训练文档.md)
12. [ORPO最佳实践](ORPO算法最佳实践.md)
13. [SimPO最佳实践](SimPO算法最佳实践.md)
14. [人类偏好对齐训练文档](人类偏好对齐训练文档.md)
15. [Megatron训练文档](Megatron训练文档.md)

### ⭐️最佳实践系列

Expand Down
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Swift DOCUMENTATION
LLM/人类偏好对齐训练文档.md
LLM/LLM评测文档.md
LLM/LLM量化文档.md
LLM/OLLAMA导出文档.md
LLM/VLLM推理加速与部署.md
LLM/LmDeploy推理加速与部署.md
LLM/Megatron训练文档.md
Expand Down
155 changes: 155 additions & 0 deletions docs/source_en/LLM/OLLaMA-Export.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# OLLaMA Export Documentation

SWIFT now supports exporting OLLaMA Model files, integrated into the `swift export` command.

## Contents

- [Environment Setup](#environment-setup)
- [Export](#export)
- [Points to Note](#points-to-note)

## Environment Setup

```shell
# Set pip global mirror (to speed up downloads)
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
# Install ms-swift
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'
```

No additional modules are needed for OLLaMA export, as SWIFT only exports the ModelFile. Users can handle subsequent operations.

## Export

The OLLaMA export command line is as follows:

```shell
# model_type
swift export --model_type llama3-8b-instruct --to_ollama true --ollama_output_dir llama3-8b-instruct-ollama
# ckpt_dir, note that for lora training, add --merge_lora true
swift export --ckpt_dir /mnt/workspace/yzhao/tastelikefeet/swift/output/qwen-7b-chat/v141-20240331-110833/checkpoint-10942 --to_ollama true --ollama_output_dir qwen-7b-chat-ollama --merge_lora true
```

After execution, the following log will be printed:
```shell
[INFO:swift] Exporting to ollama:
[INFO:swift] If you have a gguf file, try to pass the file by :--gguf_file /xxx/xxx.gguf, else SWIFT will use the original(merged) model dir
[INFO:swift] Downloading the model from ModelScope Hub, model_id: LLM-Research/Meta-Llama-3-8B-Instruct
[WARNING:modelscope] Authentication has expired, please re-login with modelscope login --token "YOUR_SDK_TOKEN" if you need to access private models or datasets.
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /mnt/workspace/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct
[INFO:swift] Save Modelfile done, you can start ollama by:
[INFO:swift] > ollama serve
[INFO:swift] In another terminal:
[INFO:swift] > ollama create my-custom-model -f /mnt/workspace/yzhao/tastelikefeet/swift/llama3-8b-instruct-ollama/Modelfile
[INFO:swift] > ollama run my-custom-model
[INFO:swift] End time of running main: 2024-08-09 17:17:48.768722
```
Check the Modelfile:
```text
FROM /mnt/workspace/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct
TEMPLATE """{{ if .System }}<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ else }}<|begin_of_text|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
{{ end }}{{ .Response }}<|eot_id|>"""
PARAMETER stop "<|eot_id|>"
PARAMETER temperature 0.3
PARAMETER top_k 20
PARAMETER top_p 0.7
PARAMETER repeat_penalty 1.0
```
Users can modify the generated file for subsequent inference.
### Using OLLaMA
To use the above file, install OLLaMA:
```shell
# https://github.com/ollama/ollama
curl -fsSL https://ollama.com/install.sh | sh
```
Start OLLaMA:
```shell
ollama serve
```
In another terminal, run:
```shell
ollama create my-custom-model -f /mnt/workspace/yzhao/tastelikefeet/swift/llama3-8b-instruct-ollama/Modelfile
```
The following log will be printed after execution:
```text
transferring model data
unpacking model metadata
processing tensors
converting model
creating new layer sha256:37b0404fb276acb2e5b75f848673566ce7048c60280470d96009772594040706
creating new layer sha256:2ecd014a372da71016e575822146f05d89dc8864522fdc88461c1e7f1532ba06
creating new layer sha256:ddc2a243c4ec10db8aed5fbbc5ac82a4f8425cdc4bd3f0c355373a45bc9b6cb0
creating new layer sha256:fc776bf39fa270fa5e2ef7c6782068acd858826e544fce2df19a7a8f74f3f9df
writing manifest
success
```
You can then use the command name for inference:
```shell
ollama run my-custom-model
```
```shell
>>> who are you?
I'm LLaMA, a large language model trained by a team of researchers at Meta AI. My primary function is to understand and respond to human
input in a helpful and informative way. I'm a type of AI designed to simulate conversation, answer questions, and even generate text based
on a given prompt or topic.

I'm not a human, but rather a computer program designed to mimic human-like conversation. I don't have personal experiences, emotions, or
physical presence, but I'm here to provide information, answer your questions, and engage in conversation to the best of my abilities.
I'm constantly learning and improving my responses based on the interactions I have with users like you, so please bear with me if I make
any mistakes or don't quite understand what you're asking. I'm here to help and provide assistance, so feel free to ask me anything!
```
## Points to Note
1. Some models may report an error during:
```shell
ollama create my-custom-model -f /mnt/workspace/yzhao/tastelikefeet/swift/qwen-7b-chat-ollama/Modelfile
```
Error message:
```shell
Error: Models based on 'QWenLMHeadModel' are not yet supported
```
This is because the conversion in OLLaMA does not support all types of models. You can perform gguf export yourself and modify the FROM field in the Modelfile:
```shell
# Detailed conversion steps can be found at: https://github.com/ggerganov/llama.cpp/blob/master/examples/quantize/README.md
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
# The model directory can be found in the `swift export` command log, similar to:
# Using model_dir: /mnt/workspace/yzhao/tastelikefeet/swift/output/qwen-7b-chat/v141-20240331-110833/checkpoint-10942-merged
python convert_hf_to_gguf.py /mnt/workspace/yzhao/tastelikefeet/swift/output/qwen-7b-chat/v141-20240331-110833/checkpoint-10942-merged
```
Then re-execute:
```shell
ollama create my-custom-model -f /mnt/workspace/yzhao/tastelikefeet/swift/qwen-7b-chat-ollama/Modelfile
```
17 changes: 9 additions & 8 deletions docs/source_en/LLM/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,15 @@
4. [Web-UI Training and Inference](../GetStarted/Web-ui.md)
5. [LLM Evaluation](LLM-eval.md)
6. [LLM Quantization](LLM-quantization.md)
7. [VLLM Inference and Deployment](VLLM-inference-acceleration-and-deployment.md)
8. [LmDeploy Inference and Deployment](LmDeploy-inference-acceleration-and-deployment.md)
9. [LLM Experimental](LLM-exp.md)
10. [DPO Training](DPO.md)
11. [ORPO Training](ORPO.md)
12. [SimPO Training](SimPO.md)
13. [Human Preference Alignment Training Documentation](Human-Preference-Alignment-Training-Documentation.md)
14. [Megatron-training](Megatron-training.md)
7. [OLLAMA Export](./OLLaMA-Export.md)
8. [VLLM Inference and Deployment](VLLM-inference-acceleration-and-deployment.md)
9. [LmDeploy Inference and Deployment](LmDeploy-inference-acceleration-and-deployment.md)
10. [LLM Experimental](LLM-exp.md)
11. [DPO Training](DPO.md)
12. [ORPO Training](ORPO.md)
13. [SimPO Training](SimPO.md)
14. [Human Preference Alignment Training Documentation](Human-Preference-Alignment-Training-Documentation.md)
15. [Megatron-training](Megatron-training.md)

### ⭐️Best Practices!

Expand Down
1 change: 1 addition & 0 deletions docs/source_en/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Swift DOCUMENTATION
LLM/Human-Preference-Alignment-Training-Documentation.md
LLM/LLM-eval.md
LLM/LLM-quantization.md
LLM/OLLaMA-Export.md
LLM/VLLM-inference-acceleration-and-deployment.md
LLM/LmDeploy-inference-acceleration-and-deployment.md
LLM/Megatron-training.md
Expand Down
1 change: 1 addition & 0 deletions swift/llm/export.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,7 @@ def llm_export(args: ExportArguments) -> None:
model_dir = args.ckpt_dir
else:
model_dir = args.model_id_or_path
logger.info(f'Using model_dir: {model_dir}')
_, tokenizer = get_model_tokenizer(
args.model_type, model_id_or_path=model_dir, revision=args.model_revision, load_model=False)
model_dir = tokenizer.model_dir
Expand Down

0 comments on commit aca5a7c

Please sign in to comment.