support qwen2-math (modelscope#1644)

hjh0119 · Aug 8, 2024 · 8066109 · 8066109
1 parent b906957
commit 8066109
Show file tree

Hide file tree

Showing 5 changed files with 87 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -55,6 +55,7 @@ You can contact us and communicate with us by adding our group:
 <img src="asset/discord_qr.jpg" width="200" height="200">  |  <img src="asset/wechat.png" width="200" height="200">
 
 ## 🎉 News
+- 🔥2024.08.08: Supports the qwen2-math series models: 1.5B, 7B, 72B. Use `swift infer --model_type qwen2-math-1_5b-instruct` for an experience.
 - 🔥2024.08.07: Support for using vLLM for accelerating inference and deployment of multimodal large models such as the llava series and phi3-vision models. You can refer to the [Multimodal & vLLM Inference Acceleration Documentation](docs/source_en/Multi-Modal/vllm-inference-acceleration.md) for more information.
 - 2024.08.06: Support for minicpm-v-v2_6-chat is available. You can use `swift infer --model_type minicpm-v-v2_6-chat` for inference experience. Best practices can be found [here](https://github.com/modelscope/swift/issues/1613).
 - 2024.08.06: Supports internlm2.5 series of 1.8b and 20b. Experience it using `swift infer --model_type internlm2_5-1_8b-chat`.

diff --git a/README_CN.md b/README_CN.md
@@ -56,6 +56,7 @@ SWIFT具有丰富全面的文档，请查看我们的文档网站:
 
 
 ## 🎉 新闻
+- 🔥2024.08.08: 支持qwen2-math系列模型, 1.5B, 7B, 72B. 使用`swift infer --model_type qwen2-math-1_5b-instruct`进行体验.
 - 🔥2024.08.07: 支持使用vllm对多模态大模型: llava系列, internvl2系列, phi3-vision, minicpm-v2.5进行推理加速和部署. 可以查看[多模态&vLLM推理加速文档](docs/source/Multi-Modal/vLLM推理加速文档.md)获取更多信息.
 - 2024.08.06: 支持minicpm-v-v2_6-chat, 使用`swift infer --model_type minicpm-v-v2_6-chat`进行推理体验, 最佳实践可以查看[这里](https://github.com/modelscope/swift/issues/1613).
 - 2024.08.06: 支持internlm2.5的1.8b和20b系列. 使用`swift infer --model_type internlm2_5-1_8b-chat`进行体验.

diff --git a/docs/source/LLM/支持的模型和数据集.md b/docs/source/LLM/支持的模型和数据集.md
@@ -103,6 +103,12 @@
 |qwen2-57b-a14b|[qwen/Qwen2-57B-A14B](https://modelscope.cn/models/qwen/Qwen2-57B-A14B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|&#x2718;|&#x2718;|transformers>=4.40|-|[Qwen/Qwen2-57B-A14B](https://huggingface.co/Qwen/Qwen2-57B-A14B)|
 |qwen2-57b-a14b-instruct|[qwen/Qwen2-57B-A14B-Instruct](https://modelscope.cn/models/qwen/Qwen2-57B-A14B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2718;|&#x2718;|transformers>=4.40|-|[Qwen/Qwen2-57B-A14B-Instruct](https://huggingface.co/Qwen/Qwen2-57B-A14B-Instruct)|
 |qwen2-57b-a14b-instruct-int4|[qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2718;|&#x2718;|auto_gptq>=0.5, transformers>=4.40|-|[Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4)|
+|qwen2-math-1_5b|[qwen/Qwen2-Math-1.5B](https://modelscope.cn/models/qwen/Qwen2-Math-1.5B/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-Math-1.5B](https://huggingface.co/Qwen/Qwen2-Math-1.5B)|
+|qwen2-math-1_5b-instruct|[qwen/Qwen2-Math-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2-Math-1.5B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-Math-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-Math-1.5B-Instruct)|
+|qwen2-math-7b|[qwen/Qwen2-Math-7B](https://modelscope.cn/models/qwen/Qwen2-Math-7B/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-Math-7B](https://huggingface.co/Qwen/Qwen2-Math-7B)|
+|qwen2-math-7b-instruct|[qwen/Qwen2-Math-7B-Instruct](https://modelscope.cn/models/qwen/Qwen2-Math-7B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2-Math-7B-Instruct)|
+|qwen2-math-72b|[qwen/Qwen2-Math-72B](https://modelscope.cn/models/qwen/Qwen2-Math-72B/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-Math-72B](https://huggingface.co/Qwen/Qwen2-Math-72B)|
+|qwen2-math-72b-instruct|[qwen/Qwen2-Math-72B-Instruct](https://modelscope.cn/models/qwen/Qwen2-Math-72B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-Math-72B-Instruct](https://huggingface.co/Qwen/Qwen2-Math-72B-Instruct)|
 |chatglm2-6b|[ZhipuAI/chatglm2-6b](https://modelscope.cn/models/ZhipuAI/chatglm2-6b/summary)|query_key_value|chatglm2|&#x2718;|&#x2714;|&#x2718;|&#x2718;|transformers<4.42|-|[THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b)|
 |chatglm2-6b-32k|[ZhipuAI/chatglm2-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm2-6b-32k/summary)|query_key_value|chatglm2|&#x2718;|&#x2714;|&#x2718;|&#x2718;|transformers<4.42|-|[THUDM/chatglm2-6b-32k](https://huggingface.co/THUDM/chatglm2-6b-32k)|
 |chatglm3-6b-base|[ZhipuAI/chatglm3-6b-base](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-base/summary)|query_key_value|chatglm-generation|&#x2718;|&#x2714;|&#x2718;|&#x2718;|transformers<4.42|-|[THUDM/chatglm3-6b-base](https://huggingface.co/THUDM/chatglm3-6b-base)|

diff --git a/docs/source_en/LLM/Supported-models-datasets.md b/docs/source_en/LLM/Supported-models-datasets.md
@@ -103,6 +103,12 @@ The table below introcudes all models supported by SWIFT:
 |qwen2-57b-a14b|[qwen/Qwen2-57B-A14B](https://modelscope.cn/models/qwen/Qwen2-57B-A14B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|&#x2718;|&#x2718;|transformers>=4.40|-|[Qwen/Qwen2-57B-A14B](https://huggingface.co/Qwen/Qwen2-57B-A14B)|
 |qwen2-57b-a14b-instruct|[qwen/Qwen2-57B-A14B-Instruct](https://modelscope.cn/models/qwen/Qwen2-57B-A14B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2718;|&#x2718;|transformers>=4.40|-|[Qwen/Qwen2-57B-A14B-Instruct](https://huggingface.co/Qwen/Qwen2-57B-A14B-Instruct)|
 |qwen2-57b-a14b-instruct-int4|[qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4](https://modelscope.cn/models/qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2718;|&#x2718;|auto_gptq>=0.5, transformers>=4.40|-|[Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4)|
+|qwen2-math-1_5b|[qwen/Qwen2-Math-1.5B](https://modelscope.cn/models/qwen/Qwen2-Math-1.5B/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-Math-1.5B](https://huggingface.co/Qwen/Qwen2-Math-1.5B)|
+|qwen2-math-1_5b-instruct|[qwen/Qwen2-Math-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2-Math-1.5B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-Math-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-Math-1.5B-Instruct)|
+|qwen2-math-7b|[qwen/Qwen2-Math-7B](https://modelscope.cn/models/qwen/Qwen2-Math-7B/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-Math-7B](https://huggingface.co/Qwen/Qwen2-Math-7B)|
+|qwen2-math-7b-instruct|[qwen/Qwen2-Math-7B-Instruct](https://modelscope.cn/models/qwen/Qwen2-Math-7B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2-Math-7B-Instruct)|
+|qwen2-math-72b|[qwen/Qwen2-Math-72B](https://modelscope.cn/models/qwen/Qwen2-Math-72B/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-Math-72B](https://huggingface.co/Qwen/Qwen2-Math-72B)|
+|qwen2-math-72b-instruct|[qwen/Qwen2-Math-72B-Instruct](https://modelscope.cn/models/qwen/Qwen2-Math-72B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2714;|transformers>=4.37|-|[Qwen/Qwen2-Math-72B-Instruct](https://huggingface.co/Qwen/Qwen2-Math-72B-Instruct)|
 |chatglm2-6b|[ZhipuAI/chatglm2-6b](https://modelscope.cn/models/ZhipuAI/chatglm2-6b/summary)|query_key_value|chatglm2|&#x2718;|&#x2714;|&#x2718;|&#x2718;|transformers<4.42|-|[THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b)|
 |chatglm2-6b-32k|[ZhipuAI/chatglm2-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm2-6b-32k/summary)|query_key_value|chatglm2|&#x2718;|&#x2714;|&#x2718;|&#x2718;|transformers<4.42|-|[THUDM/chatglm2-6b-32k](https://huggingface.co/THUDM/chatglm2-6b-32k)|
 |chatglm3-6b-base|[ZhipuAI/chatglm3-6b-base](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-base/summary)|query_key_value|chatglm-generation|&#x2718;|&#x2714;|&#x2718;|&#x2718;|transformers<4.42|-|[THUDM/chatglm3-6b-base](https://huggingface.co/THUDM/chatglm3-6b-base)|

diff --git a/swift/llm/utils/model.py b/swift/llm/utils/model.py
@@ -133,6 +133,13 @@ class ModelType:
     qwen2_57b_a14b_instruct = 'qwen2-57b-a14b-instruct'
     qwen2_57b_a14b_instruct_int4 = 'qwen2-57b-a14b-instruct-int4'
 
+    qwen2_math_1_5b = 'qwen2-math-1_5b'
+    qwen2_math_1_5b_instruct = 'qwen2-math-1_5b-instruct'
+    qwen2_math_7b = 'qwen2-math-7b'
+    qwen2_math_7b_instruct = 'qwen2-math-7b-instruct'
+    qwen2_math_72b = 'qwen2-math-72b'
+    qwen2_math_72b_instruct = 'qwen2-math-72b-instruct'
+
     # qwen-vl
     qwen_vl = 'qwen-vl'
     qwen_vl_chat = 'qwen-vl-chat'
@@ -2977,6 +2984,72 @@ def rotary_emb(self, query_states, key_states, **kwargs):
     return model, tokenizer
 
 
+@register_model(
+    ModelType.qwen2_math_1_5b_instruct,
+    'qwen/Qwen2-Math-1.5B-Instruct',
+    LoRATM.llama,
+    TemplateType.qwen,
+    support_flash_attn=True,
+    support_vllm=True,
+    support_lmdeploy=True,
+    support_megatron=True,
+    requires=['transformers>=4.37'],
+    hf_model_id='Qwen/Qwen2-Math-1.5B-Instruct')
+@register_model(
+    ModelType.qwen2_math_1_5b,
+    'qwen/Qwen2-Math-1.5B',
+    LoRATM.llama,
+    TemplateType.qwen,
+    support_flash_attn=True,
+    support_vllm=True,
+    support_lmdeploy=True,
+    support_megatron=True,
+    requires=['transformers>=4.37'],
+    hf_model_id='Qwen/Qwen2-Math-1.5B')
+@register_model(
+    ModelType.qwen2_math_7b_instruct,
+    'qwen/Qwen2-Math-7B-Instruct',
+    LoRATM.llama,
+    TemplateType.qwen,
+    support_flash_attn=True,
+    support_vllm=True,
+    support_lmdeploy=True,
+    support_megatron=True,
+    requires=['transformers>=4.37'],
+    hf_model_id='Qwen/Qwen2-Math-7B-Instruct')
+@register_model(
+    ModelType.qwen2_math_7b,
+    'qwen/Qwen2-Math-7B',
+    LoRATM.llama,
+    TemplateType.qwen,
+    support_flash_attn=True,
+    support_vllm=True,
+    support_lmdeploy=True,
+    support_megatron=True,
+    requires=['transformers>=4.37'],
+    hf_model_id='Qwen/Qwen2-Math-7B')
+@register_model(
+    ModelType.qwen2_math_72b_instruct,
+    'qwen/Qwen2-Math-72B-Instruct',
+    LoRATM.llama,
+    TemplateType.qwen,
+    support_flash_attn=True,
+    support_vllm=True,
+    support_lmdeploy=True,
+    support_megatron=True,
+    requires=['transformers>=4.37'],
+    hf_model_id='Qwen/Qwen2-Math-72B-Instruct')
+@register_model(
+    ModelType.qwen2_math_72b,
+    'qwen/Qwen2-Math-72B',
+    LoRATM.llama,
+    TemplateType.qwen,
+    support_flash_attn=True,
+    support_vllm=True,
+    support_lmdeploy=True,
+    support_megatron=True,
+    requires=['transformers>=4.37'],
+    hf_model_id='Qwen/Qwen2-Math-72B')
 @register_model(
     ModelType.qwen2_57b_a14b_instruct_int4,
     'qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4',