Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
hanlv15 committed Jul 12, 2024
2 parents 4a43524 + 6c963d8 commit 7dd241a
Show file tree
Hide file tree
Showing 112 changed files with 3,503 additions and 1,555 deletions.
136 changes: 70 additions & 66 deletions README.md

Large diffs are not rendered by default.

136 changes: 68 additions & 68 deletions README_CN.md

Large diffs are not rendered by default.

Binary file added asset/discord_qr.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
30 changes: 30 additions & 0 deletions docs/source/.readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the OS, Python version and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.12"

# Build documentation in the "docs/" directory with Sphinx
sphinx:
configuration: docs/source/conf.py

# Optionally build your docs in additional formats such as PDF and ePub
# formats:
# - pdf
# - epub

# Optional but recommended, declare the Python requirements required
# to build your documentation
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
python:
install:
- requirements: requirements/docs.txt
- requirements: requirements/framework.txt
- requirements: requirements/llm.txt
2 changes: 1 addition & 1 deletion docs/source/LLM/Agent微调最佳实践.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ Final Answer: 如果您想要一款拍照表现出色的手机,我为您推荐
| ms-bench | 60000(抽样) |
| self-recognition | 3000(重复抽样) |

我们也支持使用自己的Agent数据集。数据集格式需要符合[自定义数据集](https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E8%87%AA%E5%AE%9A%E4%B9%89%E4%B8%8E%E6%8B%93%E5%B1%95.md#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AE%E9%9B%86)的要求。更具体地,Agent的response/system应该符合上述的Action/Action Input/Observation格式。
我们也支持使用自己的Agent数据集。数据集格式需要符合[自定义数据集](%E8%87%AA%E5%AE%9A%E4%B9%89%E4%B8%8E%E6%8B%93%E5%B1%95.md#%E8%87%AA%E5%AE%9A%E4%B9%89%E6%95%B0%E6%8D%AE%E9%9B%86)的要求。更具体地,Agent的response/system应该符合上述的Action/Action Input/Observation格式。

我们将**MLP****Embedder**加入了lora_target_modules. 你可以通过指定`--lora_target_modules ALL`在所有的linear层(包括qkvo以及mlp和embedder)加lora. 这**通常是效果最好的**.

Expand Down
4 changes: 3 additions & 1 deletion docs/source/LLM/LLM微调文档.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ pip install -r requirements/llm.txt -U
```

## 微调
如果你要使用界面的方式进行微调与推理, 可以查看[界面训练与推理文档](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md).
如果你要使用界面的方式进行微调与推理, 可以查看[界面训练与推理文档](../GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md).

### 使用python
```python
Expand Down Expand Up @@ -100,6 +100,7 @@ swift sft \
--output_dir output \

# 多机多卡
# 如果多机共用磁盘请在各机器sh中额外指定`--save_on_each_node false`.
# node0
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NNODES=2 \
Expand Down Expand Up @@ -246,6 +247,7 @@ print(f'history: {history}')

使用**数据集**评估:
```bash
# 如果要推理所有数据集样本, 请额外指定`--show_dataset_sample -1`
# 直接推理
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir 'xxx/vx-xxx/checkpoint-xxx' \
Expand Down
2 changes: 1 addition & 1 deletion docs/source/LLM/LLM量化文档.md
Original file line number Diff line number Diff line change
Expand Up @@ -305,7 +305,7 @@ CUDA_VISIBLE_DEVICES=0 swift sft \
```

**注意**
- hqq支持更多自定义参数,比如为不同网络层指定不同量化配置,具体请见[命令行参数](https://github.com/modelscope/swift/blob/main/docs/source/LLM/命令行参数.md)
- hqq支持更多自定义参数,比如为不同网络层指定不同量化配置,具体请见[命令行参数](命令行参数.md)
- eetq量化为8bit量化,无需指定quantization_bit。目前不支持bf16,需要指定dtype为fp16
- eetq目前qlora速度比较慢,推荐使用hqq。参考[issue](https://github.com/NetEase-FuXi/EETQ/issues/17)

Expand Down
146 changes: 146 additions & 0 deletions docs/source/LLM/LmDeploy推理加速与部署.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# LmDeploy推理加速与部署

## 目录
- [环境准备](#环境准备)
- [推理加速](#推理加速)
- [部署](#部署)
- [多模态](#多模态)

## 环境准备
GPU设备: A10, 3090, V100, A100均可.
```bash
# 设置pip全局镜像 (加速下载)
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
# 安装ms-swift
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'

pip install lmdeploy
```

## 推理加速

### 使用python

```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
ModelType, get_lmdeploy_engine, get_default_template_type,
get_template, inference_lmdeploy, inference_stream_lmdeploy
)

model_type = ModelType.qwen_7b_chat
lmdeploy_engine = get_lmdeploy_engine(model_type)
template_type = get_default_template_type(model_type)
template = get_template(template_type, lmdeploy_engine.hf_tokenizer)
# 与`transformers.GenerationConfig`类似的接口
lmdeploy_engine.generation_config.max_new_tokens = 256
generation_info = {}

request_list = [{'query': '你好!'}, {'query': '浙江的省会在哪?'}]
resp_list = inference_lmdeploy(lmdeploy_engine, template, request_list, generation_info=generation_info)
for request, resp in zip(request_list, resp_list):
print(f"query: {request['query']}")
print(f"response: {resp['response']}")
print(generation_info)

# stream
history1 = resp_list[1]['history']
request_list = [{'query': '这有什么好吃的', 'history': history1}]
gen = inference_stream_lmdeploy(lmdeploy_engine, template, request_list, generation_info=generation_info)
query = request_list[0]['query']
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for resp_list in gen:
resp = resp_list[0]
response = resp['response']
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()

history = resp_list[0]['history']
print(f'history: {history}')
print(generation_info)
"""
query: 你好!
response: 你好!有什么我能帮助你的吗?
query: 浙江的省会在哪?
response: 浙江省会是杭州市。
{'num_prompt_tokens': 46, 'num_generated_tokens': 13, 'num_samples': 2, 'runtime': 0.2037766759749502, 'samples/s': 9.81466593480922, 'tokens/s': 63.79532857625993}
query: 这有什么好吃的
response: 杭州有许多美食,比如西湖醋鱼、东坡肉、龙井虾仁、油炸臭豆腐等,都是当地非常有名的传统名菜。此外,当地的点心也非常有特色,比如桂花糕、马蹄酥、绿豆糕等。
history: [['浙江的省会在哪?', '浙江省会是杭州市。'], ['这有什么好吃的', '杭州有许多美食,比如西湖醋鱼、东坡肉、龙井虾仁、油炸臭豆腐等,都是当地非常有名的传统名菜。此外,当地的点心也非常有特色,比如桂花糕、马蹄酥、绿豆糕等。']]
{'num_prompt_tokens': 44, 'num_generated_tokens': 53, 'num_samples': 1, 'runtime': 0.6306625790311955, 'samples/s': 1.5856339558566632, 'tokens/s': 84.03859966040315}
"""
```

**TP:**

```python
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'

from swift.llm import (
ModelType, get_lmdeploy_engine, get_default_template_type,
get_template, inference_lmdeploy, inference_stream_lmdeploy
)

model_type = ModelType.qwen_7b_chat
lmdeploy_engine = get_lmdeploy_engine(model_type, tp=2)
template_type = get_default_template_type(model_type)
template = get_template(template_type, lmdeploy_engine.hf_tokenizer)
# 与`transformers.GenerationConfig`类似的接口
lmdeploy_engine.generation_config.max_new_tokens = 256
generation_info = {}

request_list = [{'query': '你好!'}, {'query': '浙江的省会在哪?'}]
resp_list = inference_lmdeploy(lmdeploy_engine, template, request_list, generation_info=generation_info)
for request, resp in zip(request_list, resp_list):
print(f"query: {request['query']}")
print(f"response: {resp['response']}")
print(generation_info)

# stream
history1 = resp_list[1]['history']
request_list = [{'query': '这有什么好吃的', 'history': history1}]
gen = inference_stream_lmdeploy(lmdeploy_engine, template, request_list, generation_info=generation_info)
query = request_list[0]['query']
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for resp_list in gen:
resp = resp_list[0]
response = resp['response']
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()

history = resp_list[0]['history']
print(f'history: {history}')
print(generation_info)
"""
query: 你好!
response: 你好!有什么我能帮助你的吗?
query: 浙江的省会在哪?
response: 浙江省会是杭州市。
{'num_prompt_tokens': 46, 'num_generated_tokens': 13, 'num_samples': 2, 'runtime': 0.2080078640137799, 'samples/s': 9.61502109298861, 'tokens/s': 62.497637104425955}
query: 这有什么好吃的
response: 杭州有许多美食,比如西湖醋鱼、东坡肉、龙井虾仁、油焖笋等等。杭州的特色小吃也很有风味,比如桂花糕、叫花鸡、油爆虾等。此外,杭州还有许多美味的甜品,如月饼、麻薯、绿豆糕等。
history: [['浙江的省会在哪?', '浙江省会是杭州市。'], ['这有什么好吃的', '杭州有许多美食,比如西湖醋鱼、东坡肉、龙井虾仁、油焖笋等等。杭州的特色小吃也很有风味,比如桂花糕、叫花鸡、油爆虾等。此外,杭州还有许多美味的甜品,如月饼、麻薯、绿豆糕等。']]
{'num_prompt_tokens': 44, 'num_generated_tokens': 64, 'num_samples': 1, 'runtime': 0.5715192809584551, 'samples/s': 1.7497222461558426, 'tokens/s': 111.98222375397393}
"""
```


### 使用CLI
敬请期待...

## 部署
敬请期待...

## 多模态
敬请期待...
2 changes: 0 additions & 2 deletions docs/source/LLM/Qwen1.5全流程最佳实践.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,6 @@ gen = inference_stream_vllm(llm_engine, template, request_list)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for resp_list in gen:
request = request_list[0]
resp = resp_list[0]
response = resp['response']
delta = response[print_idx:]
Expand Down Expand Up @@ -346,7 +345,6 @@ gen = inference_stream_vllm(llm_engine, template, request_list)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for resp_list in gen:
request = request_list[0]
resp = resp_list[0]
response = resp['response']
delta = response[print_idx:]
Expand Down
Loading

0 comments on commit 7dd241a

Please sign in to comment.