Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Deepseek coder v2 model #1164

Merged
merged 14 commits into from
Jun 18, 2024
Prev Previous commit
Next Next commit
fix doc
  • Loading branch information
tastelikefeet committed Jun 18, 2024
commit 743ba27cd8d6266b56082743322736d9af6aa48d
74 changes: 37 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -498,44 +498,44 @@ The complete list of supported models and datasets can be found at [Supported Mo

#### LLMs

| Model Type | Model Introduction | Language | Model Size | Model Type |
| Model Type | Model Introduction | Language | Model Size | Model Type |
|------------------------------------------------|------------------------------------------------------------------------|--------------------|----------------------------------------|------------------------------------------- |
| Qwen<br>Qwen1.5<br>Qwen2 | [Tongyi Qwen 1.0 and 1.5 series models](https://github.com/QwenLM) | Chinese<br>English | 0.5B-110B<br>including quantized versions | base model<br>chat model<br>MoE model<br>code model |
| ChatGLM2<br>ChatGLM3<br>Codegeex2<br>GLM4 | [Zhipu ChatGLM series models](https://github.com/THUDM) | Chinese<br>English | 6B-9B | base model<br>chat model<br>code model<br>long text model |
| Baichuan/Baichuan2 | [Baichuan 1 and Baichuan 2](https://github.com/baichuan-inc) | Chinese<br>English | 7B-13B<br>including quantized versions | base model<br>chat model |
| Yuan2 | [Langchao Yuan series models](https://github.com/IEIT-Yuan) | Chinese<br>English | 2B-102B | instruct model |
| XVerse | [XVerse series models](https://github.com/xverse-ai) | Chinese<br>English | 7B-65B | base model<br>chat model<br>long text model<br>MoE model |
| LLaMA2 | [LLaMA2 series models](https://github.com/facebookresearch/llama) | English | 7B-70B<br>including quantized versions | base model<br>chat model |
| LLaMA3 | [LLaMA3 series models](https://github.com/meta-llama/llama3) | English | 8B-70B<br>including quantized versions | base model<br>chat model |
| Mistral<br>Mixtral | [Mistral series models](https://github.com/mistralai/mistral-src) | English | 7B-22B | base model<br>instruct model<br>MoE model |
| Yi<br>Yi1.5 | [01AI's YI series models](https://github.com/01-ai) | Chinese<br>English | 6B-34B<br>including quantized | base model<br>chat model<br>long text model |
| InternLM<br>InternLM2<br>InternLM2-Math | [Pujiang AI Lab InternLM series models](https://github.com/InternLM/InternLM) | Chinese<br>English | 1.8B-20B | base model<br>chat model<br>math model |
| DeepSeek<br>DeepSeek-MoE<br>DeepSeek-Coder<br>DeepSeek-Math<br>DeepSeek-V2 | [DeepSeek series models](https://github.com/deepseek-ai) | Chinese<br>English | 1.3B-236B | base model<br>chat model<br>MoE model<br>code model<br>math model |
| MAMBA | [MAMBA temporal convolution model](https://github.com/state-spaces/mamba) | English | 130M-2.8B | base model |
| Gemma | [Google Gemma series models](https://github.com/google/gemma_pytorch) | English | 2B-7B | base model<br>instruct model |
| MiniCPM | [OpenBmB MiniCPM series models](https://github.com/OpenBMB/MiniCPM) | Chinese<br>English | 2B-3B | chat model<br>MoE model |
| OpenBuddy | [OpenBuddy series models](https://github.com/OpenBuddy/OpenBuddy) | Chinese<br>English | 7B-67B | base model<br>chat model |
| Orion | [OrionStar AI series models](https://github.com/OrionStarAI) | Chinese<br>English | 14B | base model<br>chat model |
| BlueLM | [VIVO BlueLM large model](https://github.com/vivo-ai-lab/BlueLM) | Chinese<br>English | 7B | base model<br>chat model |
| Ziya2 | [Fengshenbang series models](https://github.com/IDEA-CCNL/Fengshenbang-LM) | Chinese<br>English | 13B | base model<br>chat model |
| Skywork | [Skywork series models](https://github.com/SkyworkAI/Skywork) | Chinese<br>English | 13B | base model<br>chat model |
| Zephyr | Zephyr series models based on Mistral | English | 7B | chat model |
| PolyLM | [Tongyi Lab self-developed PolyLM series models](https://github.com/DAMO-NLP-MT/PolyLM) | Multilingual | 13B | base model |
| SeqGPT | [Tongyi Lab self-developed text understanding model for information extraction and text classification](https://github.com/Alibaba-NLP/SeqGPT) | Chinese | 560M | semantic understanding model |
| SUS | [Southern University of Science and Technology model fine-tuned on YI](https://github.com/SUSTech-IDEA/SUS-Chat) | Chinese<br>English | 34B | chat model |
| Tongyi-Finance | [Tongyi finance series models](https://github.com/QwenLM/Qwen) | Chinese<br>English | 14B | base model<br>chat model<br>financial model |
| CodeFuse-CodeLLaMA<br>CodeFuse-Codegeex2<br>CodeFuse-Qwen | [Ant CodeFuse series models](https://github.com/codefuse-ai) | Chinese<br>English | 6B-34B | chat model<br>code model |
| phi2/phi3 | Microsoft's PHI series models | English | 3B/4B | base model<br>instruct model<br>code model |
| Grok | [X-ai](https://github.com/xai-org/grok-1) | English | 300B | base model |
| TeleChat | [Tele-AI](https://github.com/Tele-AI/Telechat) | Chinese<br>English | 7B-12B | chat model |
| dbrx | [databricks](https://github.com/databricks/dbrx) | English | 132B | base model<br>chat model |
| mengzi3 | [Langboat](https://github.com/Langboat/Mengzi3) | Chinese<br>English | 13B | base model |
| c4ai-command-r | [c4ai](https://cohere.com/command) | Multilingual | 35B-104B | chat model |
| WizardLM2 | [WizardLM2 series models](https://github.com/nlpxucan/WizardLM) | English | 7B-8x22B<br>including quantized versions | chat model<br>MoE model |
| Atom | [Atom](https://github.com/LlamaFamily/Llama-Chinese) | Chinese | 7B| base model<br>chat model|
| Chinese-LLaMA-Alpaca-2 | [Chinese-LLaMA-Alpaca-2](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2) | Chinese | 1.3B-13B| base model<br>chat model<br>long text model |
| Chinese-LLaMA-Alpaca-3 | [Chinese-LLaMA-Alpaca-3](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3) | Chinese | 8B| base model<br>chat model|
| ModelScope-Agent | [ModelScope Agent series models](https://github.com/modelscope/modelscope-agent) | Chinese | 7B-14B| agent model |
| Qwen<br>Qwen1.5<br>Qwen2 | [Tongyi Qwen 1.0 and 1.5 series models](https://github.com/QwenLM) | Chinese<br>English | 0.5B-110B<br>including quantized versions | base model<br>chat model<br>MoE model<br>code model |
| ChatGLM2<br>ChatGLM3<br>Codegeex2<br>GLM4 | [Zhipu ChatGLM series models](https://github.com/THUDM) | Chinese<br>English | 6B-9B | base model<br>chat model<br>code model<br>long text model |
| Baichuan/Baichuan2 | [Baichuan 1 and Baichuan 2](https://github.com/baichuan-inc) | Chinese<br>English | 7B-13B<br>including quantized versions | base model<br>chat model |
| Yuan2 | [Langchao Yuan series models](https://github.com/IEIT-Yuan) | Chinese<br>English | 2B-102B | instruct model |
| XVerse | [XVerse series models](https://github.com/xverse-ai) | Chinese<br>English | 7B-65B | base model<br>chat model<br>long text model<br>MoE model |
| LLaMA2 | [LLaMA2 series models](https://github.com/facebookresearch/llama) | English | 7B-70B<br>including quantized versions | base model<br>chat model |
| LLaMA3 | [LLaMA3 series models](https://github.com/meta-llama/llama3) | English | 8B-70B<br>including quantized versions | base model<br>chat model |
| Mistral<br>Mixtral | [Mistral series models](https://github.com/mistralai/mistral-src) | English | 7B-22B | base model<br>instruct model<br>MoE model |
| Yi<br>Yi1.5 | [01AI's YI series models](https://github.com/01-ai) | Chinese<br>English | 6B-34B<br>including quantized | base model<br>chat model<br>long text model |
| InternLM<br>InternLM2<br>InternLM2-Math | [Pujiang AI Lab InternLM series models](https://github.com/InternLM/InternLM) | Chinese<br>English | 1.8B-20B | base model<br>chat model<br>math model |
| DeepSeek<br>DeepSeek-MoE<br>DeepSeek-Coder<br>DeepSeek-Math<br>DeepSeek-V2<br>DeepSeek-Coder-V2 | [DeepSeek series models](https://github.com/deepseek-ai) | Chinese<br>English | 1.3B-236B | base model<br>chat model<br>MoE model<br>code model<br>math model |
| MAMBA | [MAMBA temporal convolution model](https://github.com/state-spaces/mamba) | English | 130M-2.8B | base model |
| Gemma | [Google Gemma series models](https://github.com/google/gemma_pytorch) | English | 2B-7B | base model<br>instruct model |
| MiniCPM | [OpenBmB MiniCPM series models](https://github.com/OpenBMB/MiniCPM) | Chinese<br>English | 2B-3B | chat model<br>MoE model |
| OpenBuddy | [OpenBuddy series models](https://github.com/OpenBuddy/OpenBuddy) | Chinese<br>English | 7B-67B | base model<br>chat model |
| Orion | [OrionStar AI series models](https://github.com/OrionStarAI) | Chinese<br>English | 14B | base model<br>chat model |
| BlueLM | [VIVO BlueLM large model](https://github.com/vivo-ai-lab/BlueLM) | Chinese<br>English | 7B | base model<br>chat model |
| Ziya2 | [Fengshenbang series models](https://github.com/IDEA-CCNL/Fengshenbang-LM) | Chinese<br>English | 13B | base model<br>chat model |
| Skywork | [Skywork series models](https://github.com/SkyworkAI/Skywork) | Chinese<br>English | 13B | base model<br>chat model |
| Zephyr | Zephyr series models based on Mistral | English | 7B | chat model |
| PolyLM | [Tongyi Lab self-developed PolyLM series models](https://github.com/DAMO-NLP-MT/PolyLM) | Multilingual | 13B | base model |
| SeqGPT | [Tongyi Lab self-developed text understanding model for information extraction and text classification](https://github.com/Alibaba-NLP/SeqGPT) | Chinese | 560M | semantic understanding model |
| SUS | [Southern University of Science and Technology model fine-tuned on YI](https://github.com/SUSTech-IDEA/SUS-Chat) | Chinese<br>English | 34B | chat model |
| Tongyi-Finance | [Tongyi finance series models](https://github.com/QwenLM/Qwen) | Chinese<br>English | 14B | base model<br>chat model<br>financial model |
| CodeFuse-CodeLLaMA<br>CodeFuse-Codegeex2<br>CodeFuse-Qwen | [Ant CodeFuse series models](https://github.com/codefuse-ai) | Chinese<br>English | 6B-34B | chat model<br>code model |
| phi2/phi3 | Microsoft's PHI series models | English | 3B/4B | base model<br>instruct model<br>code model |
| Grok | [X-ai](https://github.com/xai-org/grok-1) | English | 300B | base model |
| TeleChat | [Tele-AI](https://github.com/Tele-AI/Telechat) | Chinese<br>English | 7B-12B | chat model |
| dbrx | [databricks](https://github.com/databricks/dbrx) | English | 132B | base model<br>chat model |
| mengzi3 | [Langboat](https://github.com/Langboat/Mengzi3) | Chinese<br>English | 13B | base model |
| c4ai-command-r | [c4ai](https://cohere.com/command) | Multilingual | 35B-104B | chat model |
| WizardLM2 | [WizardLM2 series models](https://github.com/nlpxucan/WizardLM) | English | 7B-8x22B<br>including quantized versions | chat model<br>MoE model |
| Atom | [Atom](https://github.com/LlamaFamily/Llama-Chinese) | Chinese | 7B| base model<br>chat model|
| Chinese-LLaMA-Alpaca-2 | [Chinese-LLaMA-Alpaca-2](https://github.com/ymcui/Chinese-LLaMA-Alpaca-2) | Chinese | 1.3B-13B| base model<br>chat model<br>long text model |
| Chinese-LLaMA-Alpaca-3 | [Chinese-LLaMA-Alpaca-3](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3) | Chinese | 8B| base model<br>chat model|
| ModelScope-Agent | [ModelScope Agent series models](https://github.com/modelscope/modelscope-agent) | Chinese | 7B-14B| agent model |

#### MLLMs

Expand Down
2 changes: 1 addition & 1 deletion README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -506,7 +506,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
| Mistral<br>Mixtral | [Mistral系列模型](https://github.com/mistralai/mistral-src) | 英文 | 7B-8x22B | base模型<br>instruct模型<br>MoE模型 |
| Yi<br>Yi1.5 | [01AI的YI系列模型](https://github.com/01-ai) | 中文<br>英文 | 6B-34B<br>包含量化版本 | base模型<br>chat模型<br>长文本模型 |
| InternLM<br>InternLM2<br>InternLM2-Math | [浦江实验室书生浦语系列模型](https://github.com/InternLM/InternLM) | 中文<br>英文 | 1.8B-20B | base模型<br>chat模型<br>数学模型 |
| DeepSeek<br>DeepSeek-MoE<br>DeepSeek-Coder<br>DeepSeek-Math<br>DeepSeek-V2 | [幻方系列模型](https://github.com/deepseek-ai) | 中文<br>英文 | 1.3B-236B | base模型<br>chat模型<br>MoE模型<br>代码模型<br>数学模型 |
| DeepSeek<br>DeepSeek-MoE<br>DeepSeek-Coder<br>DeepSeek-Math<br>DeepSeek-V2<br>DeepSeek-Coder-V2 | [幻方系列模型](https://github.com/deepseek-ai) | 中文<br>英文 | 1.3B-236B | base模型<br>chat模型<br>MoE模型<br>代码模型<br>数学模型 |
| MAMBA | [MAMBA时序卷积模型](https://github.com/state-spaces/mamba) | 英文 | 130M-2.8B | base模型 |
| Gemma | [Google Gemma系列模型](https://github.com/google/gemma_pytorch) | 英文 | 2B-7B | base模型<br>instruct模型 |
| MiniCPM | [OpenBmB MiniCPM系列模型](https://github.com/OpenBMB/MiniCPM) | 中文<br>英文 | 2B-3B | chat模型<br>MoE模型 |
Expand Down