support vllm & vlm (modelscope#1630)

hjh0119 · Aug 8, 2024 · daa9b91 · daa9b91
1 parent 201e8a1
commit daa9b91
Show file tree

Hide file tree

Showing 20 changed files with 169 additions and 238 deletions.
diff --git a/README.md b/README.md
@@ -55,6 +55,7 @@ You can contact us and communicate with us by adding our group:
 <img src="asset/discord_qr.jpg" width="200" height="200">  |  <img src="asset/wechat.png" width="200" height="200">
 
 ## 🎉 News
+- 🔥2024.07.07: Support for using vLLM for accelerating inference and deployment of multimodal large models such as the llava series and phi3-vision models. You can refer to the [Multimodal & vLLM Inference Acceleration Documentation](docs/source_en/Multi-Modal/vllm-inference-acceleration.md) for more information.
 - 2024.08.06: Support for minicpm-v-v2_6-chat is available. You can use `swift infer --model_type minicpm-v-v2_6-chat` for inference experience. Best practices can be found [here](https://github.com/modelscope/swift/issues/1613).
 - 2024.08.06: Supports internlm2.5 series of 1.8b and 20b. Experience it using `swift infer --model_type internlm2_5-1_8b-chat`.
 - 🔥2024.08.05: Support evaluation for multi-modal models! Same command with [new datasets](https://swift.readthedocs.io/en/latest/LLM/LLM-eval.html#introduction).
@@ -74,7 +75,6 @@ You can contact us and communicate with us by adding our group:
 - 🔥2024.07.06: Support InternVL2 series: internvl2-2b, internvl2-4b, internvl2-8b, internvl2-26b.
 - 2024.07.06: Support codegeex4-9b-chat.
 - 2024.07.04: Support internlm2_5-7b series: internlm2_5-7b, internlm2_5-7b-chat, internlm2_5-7b-chat-1m.
-- 2024.07.02: Support for using vLLM for accelerating inference and deployment of multimodal large models such as the llava series and phi3-vision models. You can refer to the [Multimodal & vLLM Inference Acceleration Documentation](docs/source_en/Multi-Modal/vllm-inference-acceleration.md) for more information.
 - 2024.07.02: Support for `llava1_6-vicuna-7b-instruct`, `llava1_6-vicuna-13b-instruct` and other llava-hf models. For best practices, refer to [here](docs/source_en/Multi-Modal/llava-best-practice.md).
 - 🔥2024.06.29: Support [eval-scope](https://github.com/modelscope/eval-scope)&[open-compass](https://github.com/open-compass/opencompass) for evaluation! Now we have supported over 50 eval datasets like `BoolQ, ocnli, humaneval, math, ceval, mmlu, gsk8k, ARC_e`, please check our [Eval Doc](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/LLM-eval.md) to begin! Next sprint we will support Multi-modal and Agent evaluation, remember to follow us : )
 <details><summary>More</summary>
@@ -511,7 +511,6 @@ CUDA_VISIBLE_DEVICES=0 swift infer \
 
 Original model:
 ```shell
-# We recommend using vLLM for acceleration (arc evaluated in half a minute)
 CUDA_VISIBLE_DEVICES=0 swift eval --model_type qwen1half-7b-chat \
     --eval_dataset ARC_e --infer_backend vllm
 ```

diff --git a/README_CN.md b/README_CN.md
@@ -56,6 +56,7 @@ SWIFT具有丰富全面的文档，请查看我们的文档网站:
 
 
 ## 🎉 新闻
+- 🔥2024.08.07: 支持使用vllm对多模态大模型: llava系列, internvl2系列, phi3-vision, minicpm-v2.5进行推理加速和部署. 可以查看[多模态&vLLM推理加速文档](docs/source/Multi-Modal/vLLM推理加速文档.md)获取更多信息.
 - 2024.08.06: 支持minicpm-v-v2_6-chat, 使用`swift infer --model_type minicpm-v-v2_6-chat`进行推理体验, 最佳实践可以查看[这里](https://github.com/modelscope/swift/issues/1613).
 - 2024.08.06: 支持internlm2.5的1.8b和20b系列. 使用`swift infer --model_type internlm2_5-1_8b-chat`进行体验.
 - 🔥2024.08.05: 支持多模态数据集的评测！命令行完全一致，新增了许多[多模态数据集](https://swift.readthedocs.io/zh-cn/latest/LLM/LLM%E8%AF%84%E6%B5%8B%E6%96%87%E6%A1%A3.html#id2).
@@ -75,7 +76,6 @@ SWIFT具有丰富全面的文档，请查看我们的文档网站:
 - 🔥2024.07.06: 支持InternVL-2系列: internvl2-2b, internvl2-4b, internvl2-8b, internvl2-26b.
 - 2024.07.06: 支持codegeex4-9b-chat.
 - 2024.07.04: 支持internlm2_5-7b系列: internlm2_5-7b, internlm2_5-7b-chat, internlm2_5-7b-chat-1m.
-- 2024.07.02: 支持使用vllm对多模态大模型: llava系列, phi3-vision模型进行推理加速和部署. 可以查看[多模态&vLLM推理加速文档](docs/source/Multi-Modal/vLLM推理加速文档.md)获取更多信息.
 - 2024.07.02: 支持`llava1_6-vicuna-7b-instruct`, `llava1_6-vicuna-13b-instruct`等llava-hf模型. 最佳实践可以查看[这里](docs/source/Multi-Modal/llava最佳实践.md).
 - 🔥2024.06.29: 支持[eval-scope](https://github.com/modelscope/eval-scope)&[open-compass](https://github.com/open-compass/opencompass)评测! 我们支持了包含`BoolQ, ocnli, humaneval, math, ceval, mmlu, gsk8k, ARC_e`等50+标准数据集在内的评测流程, 请查看我们的[评测文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM评测文档.md)来使用。下个迭代我们会支持多模态评测和Agent评测，记得持续关注我们: )
 <details><summary>More</summary>
@@ -505,7 +505,6 @@ CUDA_VISIBLE_DEVICES=0 swift infer \
 
 原始模型:
 ```shell
-# 推荐使用vLLM加速 (半分钟评测完arc):
 CUDA_VISIBLE_DEVICES=0 swift eval --model_type qwen1half-7b-chat \
     --eval_dataset ARC_e --infer_backend vllm
 ```

diff --git a/docs/source/LLM/Megatron训练文档.md b/docs/source/LLM/Megatron训练文档.md
@@ -1,5 +1,7 @@
 # Megatron训练文档
 
+支持使用megatron进行训练的模型可以查看[这里](支持的模型和数据集.md#模型)
+
 ## 目录
 - [环境准备](#环境准备)
 - [SFT案例](#SFT案例)

diff --git a/docs/source/LLM/命令行参数.md b/docs/source/LLM/命令行参数.md
@@ -327,6 +327,7 @@ RLHF参数继承了sft参数, 除此之外增加了以下参数:
 
 - `--gpu_memory_utilization`: 初始化vllm引擎`EngineArgs`的参数, 默认为`0.9`. 该参数只有在使用vllm时才生效. VLLM推理加速和部署可以查看[VLLM推理加速与部署](VLLM推理加速与部署.md).
 - `--tensor_parallel_size`: 初始化vllm引擎`EngineArgs`的参数, 默认为`1`. 该参数只有在使用vllm时才生效.
+- `--max_num_seqs`: 初始化vllm引擎`EngineArgs`的参数, 默认为`256`. 该参数只有在使用vllm时才生效.
 - `--max_model_len`: 覆盖模型的max_model_len, 默认为`None`. 该参数只有在使用vllm时才生效.
 - `--disable_custom_all_reduce`: 是否禁用自定义的all-reduce kernel, 而回退到NCCL. 默认为`True`, 这与vLLM的默认值不同.
 - `--enforce_eager`: vllm使用pytorch eager模式还是建立cuda graph. 默认为`False`. 设置为True可以节约显存, 但会影响效率.