Skip to content

Latest commit

 

History

History
324 lines (319 loc) · 53.5 KB

Supported-models-datasets.md

File metadata and controls

324 lines (319 loc) · 53.5 KB

Supported models and datasets

Table of Contents

Models

The table below introcudes all models supported by SWIFT:

  • Model List: The model_type information registered in SWIFT.
  • Default Lora Target Modules: Default lora_target_modules used by the model.
  • Default Template: Default template used by the model.
  • Support Flash Attn: Whether the model supports flash attention to accelerate sft and infer.
  • Support VLLM: Whether the model supports vllm to accelerate infer and deployment.
  • Requires: The extra requirements used by the model.
Model Type Model ID Default Lora Target Modules Default Template Support Flash Attn Support VLLM Requires Tags
qwen-1_8b qwen/Qwen-1_8B c_attn default-generation -
qwen-1_8b-chat qwen/Qwen-1_8B-Chat c_attn qwen -
qwen-1_8b-chat-int4 qwen/Qwen-1_8B-Chat-Int4 c_attn qwen auto_gptq>=0.5 -
qwen-1_8b-chat-int8 qwen/Qwen-1_8B-Chat-Int8 c_attn qwen auto_gptq>=0.5 -
qwen-7b qwen/Qwen-7B c_attn default-generation -
qwen-7b-chat qwen/Qwen-7B-Chat c_attn qwen -
qwen-7b-chat-int4 qwen/Qwen-7B-Chat-Int4 c_attn qwen auto_gptq>=0.5 -
qwen-7b-chat-int8 qwen/Qwen-7B-Chat-Int8 c_attn qwen auto_gptq>=0.5 -
qwen-14b qwen/Qwen-14B c_attn default-generation -
qwen-14b-chat qwen/Qwen-14B-Chat c_attn qwen -
qwen-14b-chat-int4 qwen/Qwen-14B-Chat-Int4 c_attn qwen auto_gptq>=0.5 -
qwen-14b-chat-int8 qwen/Qwen-14B-Chat-Int8 c_attn qwen auto_gptq>=0.5 -
qwen-72b qwen/Qwen-72B c_attn default-generation -
qwen-72b-chat qwen/Qwen-72B-Chat c_attn qwen -
qwen-72b-chat-int4 qwen/Qwen-72B-Chat-Int4 c_attn qwen auto_gptq>=0.5 -
qwen-72b-chat-int8 qwen/Qwen-72B-Chat-Int8 c_attn qwen auto_gptq>=0.5 -
qwen1half-0_5b qwen/Qwen1.5-0.5B q_proj, k_proj, v_proj default-generation transformers>=4.37 -
qwen1half-1_8b qwen/Qwen1.5-1.8B q_proj, k_proj, v_proj default-generation transformers>=4.37 -
qwen1half-4b qwen/Qwen1.5-4B q_proj, k_proj, v_proj default-generation transformers>=4.37 -
qwen1half-7b qwen/Qwen1.5-7B q_proj, k_proj, v_proj default-generation transformers>=4.37 -
qwen1half-14b qwen/Qwen1.5-14B q_proj, k_proj, v_proj default-generation transformers>=4.37 -
qwen1half-32b qwen/Qwen1.5-32B q_proj, k_proj, v_proj default-generation transformers>=4.37 -
qwen1half-72b qwen/Qwen1.5-72B q_proj, k_proj, v_proj default-generation transformers>=4.37 -
qwen1half-moe-a2_7b qwen/Qwen1.5-MoE-A2.7B q_proj, k_proj, v_proj default-generation transformers>=4.37 -
qwen1half-0_5b-chat qwen/Qwen1.5-0.5B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 -
qwen1half-1_8b-chat qwen/Qwen1.5-1.8B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 -
qwen1half-4b-chat qwen/Qwen1.5-4B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 -
qwen1half-7b-chat qwen/Qwen1.5-7B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 -
qwen1half-14b-chat qwen/Qwen1.5-14B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 -
qwen1half-32b-chat qwen/Qwen1.5-32B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 -
qwen1half-72b-chat qwen/Qwen1.5-72B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 -
qwen1half-moe-a2_7b-chat qwen/Qwen1.5-MoE-A2.7B-Chat q_proj, k_proj, v_proj qwen transformers>=4.37 -
qwen1half-0_5b-chat-int4 qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 -
qwen1half-1_8b-chat-int4 qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 -
qwen1half-4b-chat-int4 qwen/Qwen1.5-4B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 -
qwen1half-7b-chat-int4 qwen/Qwen1.5-7B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 -
qwen1half-14b-chat-int4 qwen/Qwen1.5-14B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 -
qwen1half-32b-chat-int4 qwen/Qwen1.5-32B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 -
qwen1half-72b-chat-int4 qwen/Qwen1.5-72B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 -
qwen1half-0_5b-chat-int8 qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 -
qwen1half-1_8b-chat-int8 qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 -
qwen1half-4b-chat-int8 qwen/Qwen1.5-4B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 -
qwen1half-7b-chat-int8 qwen/Qwen1.5-7B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 -
qwen1half-14b-chat-int8 qwen/Qwen1.5-14B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 -
qwen1half-72b-chat-int8 qwen/Qwen1.5-72B-Chat-GPTQ-Int8 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 -
qwen1half-moe-a2_7b-chat-int4 qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 q_proj, k_proj, v_proj qwen auto_gptq>=0.5, transformers>=4.37 -
qwen1half-0_5b-chat-awq qwen/Qwen1.5-0.5B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq -
qwen1half-1_8b-chat-awq qwen/Qwen1.5-1.8B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq -
qwen1half-4b-chat-awq qwen/Qwen1.5-4B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq -
qwen1half-7b-chat-awq qwen/Qwen1.5-7B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq -
qwen1half-14b-chat-awq qwen/Qwen1.5-14B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq -
qwen1half-72b-chat-awq qwen/Qwen1.5-72B-Chat-AWQ q_proj, k_proj, v_proj qwen transformers>=4.37, autoawq -
qwen-vl qwen/Qwen-VL c_attn default-generation multi-modal, vision
qwen-vl-chat qwen/Qwen-VL-Chat c_attn qwen multi-modal, vision
qwen-vl-chat-int4 qwen/Qwen-VL-Chat-Int4 c_attn qwen auto_gptq>=0.5 multi-modal, vision
qwen-audio qwen/Qwen-Audio c_attn qwen-audio-generation multi-modal, audio
qwen-audio-chat qwen/Qwen-Audio-Chat c_attn qwen-audio multi-modal, audio
chatglm2-6b ZhipuAI/chatglm2-6b query_key_value chatglm2 -
chatglm2-6b-32k ZhipuAI/chatglm2-6b-32k query_key_value chatglm2 -
chatglm3-6b-base ZhipuAI/chatglm3-6b-base query_key_value chatglm-generation -
chatglm3-6b ZhipuAI/chatglm3-6b query_key_value chatglm3 -
chatglm3-6b-32k ZhipuAI/chatglm3-6b-32k query_key_value chatglm3 -
codegeex2-6b ZhipuAI/codegeex2-6b query_key_value chatglm-generation transformers<4.34 coding
llama2-7b modelscope/Llama-2-7b-ms q_proj, k_proj, v_proj default-generation-bos -
llama2-7b-chat modelscope/Llama-2-7b-chat-ms q_proj, k_proj, v_proj llama -
llama2-13b modelscope/Llama-2-13b-ms q_proj, k_proj, v_proj default-generation-bos -
llama2-13b-chat modelscope/Llama-2-13b-chat-ms q_proj, k_proj, v_proj llama -
llama2-70b modelscope/Llama-2-70b-ms q_proj, k_proj, v_proj default-generation-bos -
llama2-70b-chat modelscope/Llama-2-70b-chat-ms q_proj, k_proj, v_proj llama -
llama2-7b-aqlm-2bit-1x16 AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf q_proj, k_proj, v_proj default-generation-bos transformers>=4.38, aqlm, torch>=2.2.0 -
llava1d6-mistral-7b-instruct AI-ModelScope/llava-v1.6-mistral-7b q_proj, k_proj, v_proj llava-mistral-instruct transformers>=4.34 multi-modal, vision
llava1d6-yi-34b-instruct AI-ModelScope/llava-v1.6-34b q_proj, k_proj, v_proj llava-yi-instruct multi-modal, vision
yi-6b 01ai/Yi-6B q_proj, k_proj, v_proj default-generation -
yi-6b-200k 01ai/Yi-6B-200K q_proj, k_proj, v_proj default-generation -
yi-6b-chat 01ai/Yi-6B-Chat q_proj, k_proj, v_proj yi -
yi-9b 01ai/Yi-9B q_proj, k_proj, v_proj default-generation -
yi-34b 01ai/Yi-34B q_proj, k_proj, v_proj default-generation -
yi-34b-200k 01ai/Yi-34B-200K q_proj, k_proj, v_proj default-generation -
yi-34b-chat 01ai/Yi-34B-Chat q_proj, k_proj, v_proj yi -
yi-vl-6b-chat 01ai/Yi-VL-6B q_proj, k_proj, v_proj yi-vl transformers>=4.34 multi-modal, vision
yi-vl-34b-chat 01ai/Yi-VL-34B q_proj, k_proj, v_proj yi-vl transformers>=4.34 multi-modal, vision
internlm-7b Shanghai_AI_Laboratory/internlm-7b q_proj, k_proj, v_proj default-generation-bos -
internlm-7b-chat Shanghai_AI_Laboratory/internlm-chat-7b-v1_1 q_proj, k_proj, v_proj internlm -
internlm-7b-chat-8k Shanghai_AI_Laboratory/internlm-chat-7b-8k q_proj, k_proj, v_proj internlm -
internlm-20b Shanghai_AI_Laboratory/internlm-20b q_proj, k_proj, v_proj default-generation-bos -
internlm-20b-chat Shanghai_AI_Laboratory/internlm-chat-20b q_proj, k_proj, v_proj internlm -
internlm2-1_8b Shanghai_AI_Laboratory/internlm2-1_8b wqkv default-generation-bos -
internlm2-1_8b-sft-chat Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft wqkv internlm2 -
internlm2-1_8b-chat Shanghai_AI_Laboratory/internlm2-chat-1_8b wqkv internlm2 -
internlm2-7b-base Shanghai_AI_Laboratory/internlm2-base-7b wqkv default-generation-bos -
internlm2-7b Shanghai_AI_Laboratory/internlm2-7b wqkv default-generation-bos -
internlm2-7b-sft-chat Shanghai_AI_Laboratory/internlm2-chat-7b-sft wqkv internlm2 -
internlm2-7b-chat Shanghai_AI_Laboratory/internlm2-chat-7b wqkv internlm2 -
internlm2-20b-base Shanghai_AI_Laboratory/internlm2-base-20b wqkv default-generation-bos -
internlm2-20b Shanghai_AI_Laboratory/internlm2-20b wqkv default-generation-bos -
internlm2-20b-sft-chat Shanghai_AI_Laboratory/internlm2-chat-20b-sft wqkv internlm2 -
internlm2-20b-chat Shanghai_AI_Laboratory/internlm2-chat-20b wqkv internlm2 -
internlm2-math-7b Shanghai_AI_Laboratory/internlm2-math-base-7b wqkv default-generation-bos math
internlm2-math-7b-chat Shanghai_AI_Laboratory/internlm2-math-7b wqkv internlm2 math
internlm2-math-20b Shanghai_AI_Laboratory/internlm2-math-base-20b wqkv default-generation-bos math
internlm2-math-20b-chat Shanghai_AI_Laboratory/internlm2-math-20b wqkv internlm2 math
internlm-xcomposer2-7b-chat Shanghai_AI_Laboratory/internlm-xcomposer2-7b wqkv internlm-xcomposer2 multi-modal, vision
deepseek-7b deepseek-ai/deepseek-llm-7b-base q_proj, k_proj, v_proj default-generation-bos -
deepseek-7b-chat deepseek-ai/deepseek-llm-7b-chat q_proj, k_proj, v_proj deepseek -
deepseek-moe-16b deepseek-ai/deepseek-moe-16b-base q_proj, k_proj, v_proj default-generation-bos -
deepseek-moe-16b-chat deepseek-ai/deepseek-moe-16b-chat q_proj, k_proj, v_proj deepseek -
deepseek-67b deepseek-ai/deepseek-llm-67b-base q_proj, k_proj, v_proj default-generation-bos -
deepseek-67b-chat deepseek-ai/deepseek-llm-67b-chat q_proj, k_proj, v_proj deepseek -
deepseek-coder-1_3b deepseek-ai/deepseek-coder-1.3b-base q_proj, k_proj, v_proj default-generation-bos coding
deepseek-coder-1_3b-instruct deepseek-ai/deepseek-coder-1.3b-instruct q_proj, k_proj, v_proj deepseek-coder coding
deepseek-coder-6_7b deepseek-ai/deepseek-coder-6.7b-base q_proj, k_proj, v_proj default-generation-bos coding
deepseek-coder-6_7b-instruct deepseek-ai/deepseek-coder-6.7b-instruct q_proj, k_proj, v_proj deepseek-coder coding
deepseek-coder-33b deepseek-ai/deepseek-coder-33b-base q_proj, k_proj, v_proj default-generation-bos coding
deepseek-coder-33b-instruct deepseek-ai/deepseek-coder-33b-instruct q_proj, k_proj, v_proj deepseek-coder coding
deepseek-math-7b deepseek-ai/deepseek-math-7b-base q_proj, k_proj, v_proj default-generation-bos math
deepseek-math-7b-instruct deepseek-ai/deepseek-math-7b-instruct q_proj, k_proj, v_proj deepseek math
deepseek-math-7b-chat deepseek-ai/deepseek-math-7b-rl q_proj, k_proj, v_proj deepseek math
deepseek-vl-1_3b-chat deepseek-ai/deepseek-vl-1.3b-chat q_proj, k_proj, v_proj deepseek-vl multi-modal, vision
deepseek-vl-7b-chat deepseek-ai/deepseek-vl-7b-chat q_proj, k_proj, v_proj deepseek-vl multi-modal, vision
gemma-2b AI-ModelScope/gemma-2b q_proj, k_proj, v_proj default-generation-bos transformers>=4.38 -
gemma-7b AI-ModelScope/gemma-7b q_proj, k_proj, v_proj default-generation-bos transformers>=4.38 -
gemma-2b-instruct AI-ModelScope/gemma-2b-it q_proj, k_proj, v_proj gemma transformers>=4.38 -
gemma-7b-instruct AI-ModelScope/gemma-7b-it q_proj, k_proj, v_proj gemma transformers>=4.38 -
minicpm-1b-sft-chat OpenBMB/MiniCPM-1B-sft-bf16 q_proj, k_proj, v_proj minicpm transformers>=4.36.0 -
minicpm-2b-sft-chat OpenBMB/MiniCPM-2B-sft-fp32 q_proj, k_proj, v_proj minicpm -
minicpm-2b-chat OpenBMB/MiniCPM-2B-dpo-fp32 q_proj, k_proj, v_proj minicpm -
minicpm-2b-128k OpenBMB/MiniCPM-2B-128k q_proj, k_proj, v_proj chatml transformers>=4.36.0 -
minicpm-moe-8x2b OpenBMB/MiniCPM-MoE-8x2B q_proj, k_proj, v_proj minicpm transformers>=4.36.0 -
minicpm-v-3b-chat OpenBMB/MiniCPM-V q_proj, k_proj, v_proj minicpm-v -
minicpm-v-v2 OpenBMB/MiniCPM-V-2 q_proj, k_proj, v_proj minicpm-v -
openbuddy-llama2-13b-chat OpenBuddy/openbuddy-llama2-13b-v8.1-fp16 q_proj, k_proj, v_proj openbuddy -
openbuddy-llama-65b-chat OpenBuddy/openbuddy-llama-65b-v8-bf16 q_proj, k_proj, v_proj openbuddy -
openbuddy-llama2-70b-chat OpenBuddy/openbuddy-llama2-70b-v10.1-bf16 q_proj, k_proj, v_proj openbuddy -
openbuddy-mistral-7b-chat OpenBuddy/openbuddy-mistral-7b-v17.1-32k q_proj, k_proj, v_proj openbuddy transformers>=4.34 -
openbuddy-zephyr-7b-chat OpenBuddy/openbuddy-zephyr-7b-v14.1 q_proj, k_proj, v_proj openbuddy transformers>=4.34 -
openbuddy-deepseek-67b-chat OpenBuddy/openbuddy-deepseek-67b-v15.2 q_proj, k_proj, v_proj openbuddy -
openbuddy-mixtral-moe-7b-chat OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k q_proj, k_proj, v_proj openbuddy transformers>=4.36 -
mistral-7b AI-ModelScope/Mistral-7B-v0.1 q_proj, k_proj, v_proj default-generation-bos transformers>=4.34 -
mistral-7b-v2 AI-ModelScope/Mistral-7B-v0.2-hf q_proj, k_proj, v_proj default-generation-bos transformers>=4.34 -
mistral-7b-instruct AI-ModelScope/Mistral-7B-Instruct-v0.1 q_proj, k_proj, v_proj llama transformers>=4.34 -
mistral-7b-instruct-v2 AI-ModelScope/Mistral-7B-Instruct-v0.2 q_proj, k_proj, v_proj llama transformers>=4.34 -
mixtral-moe-7b AI-ModelScope/Mixtral-8x7B-v0.1 q_proj, k_proj, v_proj default-generation-bos transformers>=4.36 -
mixtral-moe-7b-instruct AI-ModelScope/Mixtral-8x7B-Instruct-v0.1 q_proj, k_proj, v_proj llama transformers>=4.36 -
mixtral-moe-7b-aqlm-2bit-1x16 AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf q_proj, k_proj, v_proj default-generation-bos transformers>=4.38, aqlm, torch>=2.2.0 -
mixtral-moe-8x22b-v1 AI-ModelScope/Mixtral-8x22B-v0.1 q_proj, k_proj, v_proj default-generation-bos transformers>=4.36 -
baichuan-7b baichuan-inc/baichuan-7B W_pack default-generation transformers<4.34 -
baichuan-13b baichuan-inc/Baichuan-13B-Base W_pack default-generation transformers<4.34 -
baichuan-13b-chat baichuan-inc/Baichuan-13B-Chat W_pack baichuan transformers<4.34 -
baichuan2-7b baichuan-inc/Baichuan2-7B-Base W_pack default-generation -
baichuan2-7b-chat baichuan-inc/Baichuan2-7B-Chat W_pack baichuan -
baichuan2-7b-chat-int4 baichuan-inc/Baichuan2-7B-Chat-4bits W_pack baichuan bitsandbytes<0.41.2, accelerate<0.26 -
baichuan2-13b baichuan-inc/Baichuan2-13B-Base W_pack default-generation -
baichuan2-13b-chat baichuan-inc/Baichuan2-13B-Chat W_pack baichuan -
baichuan2-13b-chat-int4 baichuan-inc/Baichuan2-13B-Chat-4bits W_pack baichuan bitsandbytes<0.41.2, accelerate<0.26 -
mplug-owl2-chat iic/mPLUG-Owl2 q_proj, k_proj.multiway.0, k_proj.multiway.1, v_proj.multiway.0, v_proj.multiway.1 mplug-owl2 transformers<4.35, icecream -
mplug-owl2d1-chat iic/mPLUG-Owl2.1 c_attn.multiway.0, c_attn.multiway.1 mplug-owl2 transformers<4.35, icecream -
yuan2-2b-instruct YuanLLM/Yuan2.0-2B-hf q_proj, k_proj, v_proj yuan -
yuan2-2b-janus-instruct YuanLLM/Yuan2-2B-Janus-hf q_proj, k_proj, v_proj yuan -
yuan2-51b-instruct YuanLLM/Yuan2.0-51B-hf q_proj, k_proj, v_proj yuan -
yuan2-102b-instruct YuanLLM/Yuan2.0-102B-hf q_proj, k_proj, v_proj yuan -
xverse-7b xverse/XVERSE-7B q_proj, k_proj, v_proj default-generation -
xverse-7b-chat xverse/XVERSE-7B-Chat q_proj, k_proj, v_proj xverse -
xverse-13b xverse/XVERSE-13B q_proj, k_proj, v_proj default-generation -
xverse-13b-chat xverse/XVERSE-13B-Chat q_proj, k_proj, v_proj xverse -
xverse-65b xverse/XVERSE-65B q_proj, k_proj, v_proj default-generation -
xverse-65b-v2 xverse/XVERSE-65B-2 q_proj, k_proj, v_proj default-generation -
xverse-65b-chat xverse/XVERSE-65B-Chat q_proj, k_proj, v_proj xverse -
xverse-13b-256k xverse/XVERSE-13B-256K q_proj, k_proj, v_proj default-generation -
xverse-moe-a4_2b xverse/XVERSE-MoE-A4.2B q_proj, k_proj, v_proj default-generation -
orion-14b OrionStarAI/Orion-14B-Base q_proj, k_proj, v_proj default-generation -
orion-14b-chat OrionStarAI/Orion-14B-Chat q_proj, k_proj, v_proj orion -
bluelm-7b vivo-ai/BlueLM-7B-Base q_proj, k_proj, v_proj default-generation-bos -
bluelm-7b-32k vivo-ai/BlueLM-7B-Base-32K q_proj, k_proj, v_proj default-generation-bos -
bluelm-7b-chat vivo-ai/BlueLM-7B-Chat q_proj, k_proj, v_proj bluelm -
bluelm-7b-chat-32k vivo-ai/BlueLM-7B-Chat-32K q_proj, k_proj, v_proj bluelm -
ziya2-13b Fengshenbang/Ziya2-13B-Base q_proj, k_proj, v_proj default-generation-bos -
ziya2-13b-chat Fengshenbang/Ziya2-13B-Chat q_proj, k_proj, v_proj ziya -
skywork-13b skywork/Skywork-13B-base q_proj, k_proj, v_proj default-generation-bos -
skywork-13b-chat skywork/Skywork-13B-chat q_proj, k_proj, v_proj skywork -
zephyr-7b-beta-chat modelscope/zephyr-7b-beta q_proj, k_proj, v_proj zephyr transformers>=4.34 -
polylm-13b damo/nlp_polylm_13b_text_generation c_attn default-generation -
seqgpt-560m damo/nlp_seqgpt-560m query_key_value default-generation -
sus-34b-chat SUSTC/SUS-Chat-34B q_proj, k_proj, v_proj sus -
tongyi-finance-14b TongyiFinance/Tongyi-Finance-14B c_attn default-generation financial
tongyi-finance-14b-chat TongyiFinance/Tongyi-Finance-14B-Chat c_attn qwen financial
tongyi-finance-14b-chat-int4 TongyiFinance/Tongyi-Finance-14B-Chat-Int4 c_attn qwen auto_gptq>=0.5 financial
codefuse-codellama-34b-chat codefuse-ai/CodeFuse-CodeLlama-34B q_proj, k_proj, v_proj codefuse-codellama coding
codefuse-codegeex2-6b-chat codefuse-ai/CodeFuse-CodeGeeX2-6B query_key_value codefuse transformers<4.34 coding
codefuse-qwen-14b-chat codefuse-ai/CodeFuse-QWen-14B c_attn codefuse coding
phi2-3b AI-ModelScope/phi-2 Wqkv default-generation coding
cogvlm-17b-instruct ZhipuAI/cogvlm-chat vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense cogvlm-instruct multi-modal, vision
cogagent-18b-chat ZhipuAI/cogagent-chat vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense cogagent-chat multi-modal, vision
cogagent-18b-instruct ZhipuAI/cogagent-vqa vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense cogagent-instruct multi-modal, vision
mamba-130m AI-ModelScope/mamba-130m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 -
mamba-370m AI-ModelScope/mamba-370m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 -
mamba-390m AI-ModelScope/mamba-390m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 -
mamba-790m AI-ModelScope/mamba-790m-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 -
mamba-1.4b AI-ModelScope/mamba-1.4b-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 -
mamba-2.8b AI-ModelScope/mamba-2.8b-hf in_proj, x_proj, embeddings, out_proj default-generation transformers>=4.39.0 -
telechat-7b TeleAI/TeleChat-7B key_value, query telechat -
telechat-12b TeleAI/TeleChat-12B key_value, query telechat -
grok-1 colossalai/grok-1-pytorch q_proj, k_proj, v_proj default-generation -
dbrx-instruct AI-ModelScope/dbrx-instruct attn.Wqkv dbrx transformers>=4.36 -
dbrx-base AI-ModelScope/dbrx-base attn.Wqkv dbrx transformers>=4.36 -
mengzi3-13b-base langboat/Mengzi3-13B-Base q_proj, k_proj, v_proj mengzi -
c4ai-command-r-v01 AI-ModelScope/c4ai-command-r-v01 q_proj, k_proj, v_proj c4ai transformers>=4.39.1 -
c4ai-command-r-plus AI-ModelScope/c4ai-command-r-plus q_proj, k_proj, v_proj c4ai transformers>4.39 -

dataset

The table below introduces the datasets supported by SWIFT:

  • Dataset Name: The dataset name registered in SWIFT.
  • Dataset ID: The dataset id in ModelScope.
  • Size: The data row count of the dataset.
  • Statistic: Dataset statistics. We use the number of tokens for statistics, which helps adjust the max_length hyperparameter. We concatenate the training and validation sets of the dataset and then compute the statistics. We use qwen's tokenizer to tokenize the dataset. Different tokenizers produce different statistics. If you want to obtain token statistics for tokenizers of other models, you can use the script to get them yourself.
Dataset Name Dataset ID Train Size Val Size Statistic (token) Tags
🔥ms-bench iic/ms_bench 316228 0 345.0±441.3, min=22, max=30960 chat, general, multi-round
🔥ms-bench-mini iic/ms_bench 19492 0 353.9±439.4, min=29, max=12078 chat, general, multi-round
🔥alpaca-en AI-ModelScope/alpaca-gpt4-data-en 52002 0 176.2±125.8, min=26, max=740 chat, general
🔥alpaca-zh AI-ModelScope/alpaca-gpt4-data-zh 48818 0 162.1±93.9, min=26, max=856 chat, general
multi-alpaca-all damo/nlp_polylm_multialpaca_sft 131867 0 112.9±50.6, min=26, max=1226 chat, general, multilingual
instinwild-en wyj123456/instinwild 52191 0 160.2±69.7, min=33, max=763 chat, general
instinwild-zh wyj123456/instinwild 51504 0 130.3±45.1, min=28, max=1434 chat, general
cot-en YorickHe/CoT 74771 0 122.7±64.8, min=51, max=8320 chat, general
cot-zh YorickHe/CoT_zh 74771 0 117.5±70.8, min=43, max=9636 chat, general
firefly-all-zh wyj123456/firefly 1649399 0 178.1±260.4, min=26, max=12516 chat, general
instruct-en wyj123456/instruct 888970 0 268.9±331.2, min=26, max=7252 chat, general
gpt4all-en wyj123456/GPT4all 806199 0 302.5±384.1, min=27, max=7391 chat, general
sharegpt-en huangjintao/sharegpt 99799 0 1045.7±431.9, min=22, max=7907 chat, general, multi-round
sharegpt-zh huangjintao/sharegpt 135399 0 806.3±771.7, min=21, max=65318 chat, general, multi-round
tulu-v2-sft-mixture AI-ModelScope/tulu-v2-sft-mixture 326154 0 867.8±996.4, min=22, max=12111 chat, multilingual, general, multi-round
wikipedia-zh AI-ModelScope/wikipedia-cn-20230720-filtered 254547 0 568.4±713.2, min=37, max=78678 text-generation, general, pretrained
open-orca AI-ModelScope/OpenOrca 3239027 0 360.4±402.9, min=27, max=8672 chat, multilingual, general
open-orca-gpt4 AI-ModelScope/OpenOrca 994896 0 382.3±417.4, min=31, max=8740 chat, multilingual, general
sharegpt-gpt4 AI-ModelScope/sharegpt_gpt4 103063 0 1286.2±2089.4, min=22, max=221080 chat, multilingual, general, multi-round
🔥sharegpt-gpt4-mini AI-ModelScope/sharegpt_gpt4 6205 0 3511.6±6068.5, min=33, max=116018 chat, multilingual, general, multi-round, gpt4
🔥ms-agent iic/ms_agent 30000 0 647.7±217.1, min=199, max=2722 chat, agent, multi-round
ms-agent-for-agentfabric-default AI-ModelScope/ms_agent_for_agentfabric 30000 0 617.8±199.1, min=251, max=2657 chat, agent, multi-round
ms-agent-for-agentfabric-addition AI-ModelScope/ms_agent_for_agentfabric 488 0 2084.9±1514.8, min=489, max=7354 chat, agent, multi-round
damo-agent-zh damo/MSAgent-Bench 422115 161 965.7±440.9, min=321, max=31535 chat, agent, multi-round
damo-agent-mini-zh damo/MSAgent-Bench 39964 152 1230.9±350.1, min=558, max=4982 chat, agent, multi-round
agent-instruct-all-en huangjintao/AgentInstruct_copy 1866 0 1144.3±635.5, min=206, max=6412 chat, agent, multi-round
code-alpaca-en wyj123456/code_alpaca_en 20016 0 100.1±60.1, min=29, max=1776 chat, coding
🔥leetcode-python-en AI-ModelScope/leetcode-solutions-python 2359 0 723.8±233.5, min=259, max=2117 chat, coding
🔥codefuse-python-en codefuse-ai/CodeExercise-Python-27k 27224 0 483.6±193.9, min=45, max=3082 chat, coding
🔥codefuse-evol-instruction-zh codefuse-ai/Evol-instruction-66k 66862 0 439.6±206.3, min=37, max=2983 chat, coding
medical-en huangjintao/medical_zh 117117 500 257.4±89.1, min=36, max=2564 chat, medical
medical-zh huangjintao/medical_zh 1950472 500 167.2±219.7, min=26, max=27351 chat, medical
medical-mini-zh huangjintao/medical_zh 50000 500 168.1±220.8, min=26, max=12320 chat, medical
🔥disc-med-sft-zh AI-ModelScope/DISC-Med-SFT 441767 0 354.1±193.1, min=25, max=2231 chat, medical
lawyer-llama-zh AI-ModelScope/lawyer_llama_data 21476 0 194.4±91.7, min=27, max=924 chat, law
tigerbot-law-zh AI-ModelScope/tigerbot-law-plugin 55895 0 109.9±126.4, min=37, max=18878 text-generation, law, pretrained
🔥disc-law-sft-zh AI-ModelScope/DISC-Law-SFT 166758 0 533.7±495.4, min=30, max=15169 chat, law
🔥blossom-math-zh AI-ModelScope/blossom-math-v2 10000 0 169.3±58.7, min=35, max=563 chat, math
school-math-zh AI-ModelScope/school_math_0.25M 248480 0 157.6±72.1, min=33, max=3450 chat, math
open-platypus-en AI-ModelScope/Open-Platypus 24926 0 367.9±254.8, min=30, max=3951 chat, math
text2sql-en AI-ModelScope/texttosqlv2_25000_v2 25000 0 274.6±326.4, min=38, max=1975 chat, sql
🔥sql-create-context-en AI-ModelScope/sql-create-context 78577 0 80.2±17.8, min=36, max=456 chat, sql
🔥advertise-gen-zh lvjianjin/AdvertiseGen 97484 915 131.6±21.7, min=52, max=242 text-generation
🔥dureader-robust-zh modelscope/DuReader_robust-QG 15937 1962 242.1±137.4, min=61, max=1417 text-generation
cmnli-zh clue 391783 12241 83.6±16.6, min=52, max=200 text-generation, classification
🔥cmnli-mini-zh clue 20000 200 82.9±16.3, min=52, max=188 text-generation, classification
🔥jd-sentiment-zh DAMO_NLP/jd 45012 4988 67.0±83.2, min=40, max=4040 text-generation, classification
🔥hc3-zh simpleai/HC3-Chinese 39781 0 177.8±81.5, min=58, max=3052 text-generation, classification
🔥hc3-en simpleai/HC3 11021 0 299.3±138.7, min=66, max=2268 text-generation, classification
finance-en wyj123456/finance_en 68911 0 135.6±134.3, min=26, max=3525 chat, financial
poetry-zh modelscope/chinese-poetry-collection 388599 1710 55.2±9.4, min=23, max=83 text-generation, poetry
webnovel-zh AI-ModelScope/webnovel_cn 50000 0 1478.9±11526.1, min=100, max=490484 chat, novel
generated-chat-zh AI-ModelScope/generated_chat_0.4M 396004 0 273.3±52.0, min=32, max=873 chat, character-dialogue
cls-fudan-news-zh damo/zh_cls_fudan-news 4959 0 3234.4±2547.5, min=91, max=19548 chat, classification
ner-jave-zh damo/zh_ner-JAVE 1266 0 118.3±45.5, min=44, max=223 chat, ner
long-alpaca-12k AI-ModelScope/LongAlpaca-12k 11998 0 9619.0±8295.8, min=36, max=78925 longlora, QA
coco-en modelscope/coco_2014_caption 414113 40504 298.8±2.8, min=294, max=351 chat, multi-modal, vision
🔥coco-mini-en modelscope/coco_2014_caption 20000 200 298.8±2.8, min=294, max=339 chat, multi-modal, vision
🔥coco-mini-en-2 modelscope/coco_2014_caption 20000 200 36.8±2.8, min=32, max=77 chat, multi-modal, vision
capcha-images AI-ModelScope/captcha-images 6000 2000 29.0±0.0, min=29, max=29 chat, multi-modal, vision
aishell1-zh speech_asr/speech_asr_aishell1_trainsets 134424 7176 152.2±36.8, min=63, max=419 chat, multi-modal, audio
🔥aishell1-mini-zh speech_asr/speech_asr_aishell1_trainsets 14326 200 152.0±35.5, min=74, max=359 chat, multi-modal, audio
hh-rlhf-harmless-base AI-ModelScope/hh-rlhf 42462 2308 167.2±123.1, min=22, max=986 rlhf, dpo, pairwise
hh-rlhf-helpful-base AI-ModelScope/hh-rlhf 43777 2348 201.9±135.2, min=25, max=1070 rlhf, dpo, pairwise
hh-rlhf-helpful-online AI-ModelScope/hh-rlhf 10150 1137 401.5±278.7, min=32, max=1987 rlhf, dpo, pairwise
hh-rlhf-helpful-rejection-sampled AI-ModelScope/hh-rlhf 52413 2749 247.0±152.6, min=26, max=1300 rlhf, dpo, pairwise
hh-rlhf-red-team-attempts AI-ModelScope/hh-rlhf 52413 2749 247.0±152.6, min=26, max=1300 rlhf, dpo, pairwise
🔥hh-rlhf-cn AI-ModelScope/hh_rlhf_cn 172085 9292 172.8±124.0, min=22, max=1638 rlhf, dpo, pairwise
hh-rlhf-cn-harmless-base-cn AI-ModelScope/hh_rlhf_cn 42394 2304 143.9±109.4, min=24, max=3078 rlhf, dpo, pairwise
hh-rlhf-cn-helpful-base-cn AI-ModelScope/hh_rlhf_cn 43722 2346 176.8±120.0, min=26, max=1420 rlhf, dpo, pairwise
hh-rlhf-cn-harmless-base-en AI-ModelScope/hh_rlhf_cn 42394 2304 167.5±123.2, min=22, max=986 rlhf, dpo, pairwise
hh-rlhf-cn-helpful-base-en AI-ModelScope/hh_rlhf_cn 43722 2346 202.2±135.3, min=25, max=1070 rlhf, dpo, pairwise
stack-exchange-paired AI-ModelScope/stack-exchange-paired 4483004 0 534.5±594.6, min=31, max=56588 hfrl, dpo, pairwise
pileval huangjintao/pile-val-backup 214670 0 1612.3±8856.2, min=11, max=1208955 text-generation, awq
🔥coig-cqia-chinese-traditional AI-ModelScope/COIG-CQIA 1111 0 172.6±59.9, min=55, max=856 general
🔥coig-cqia-coig-pc AI-ModelScope/COIG-CQIA 3000 0 353.5±859.6, min=34, max=19288 general
🔥coig-cqia-exam AI-ModelScope/COIG-CQIA 4856 0 275.0±240.0, min=45, max=4932 general
🔥coig-cqia-finance AI-ModelScope/COIG-CQIA 11288 0 1266.4±561.1, min=60, max=10582 general
🔥coig-cqia-douban AI-ModelScope/COIG-CQIA 3086 0 402.9±544.7, min=88, max=10870 general
🔥coig-cqia-human-value AI-ModelScope/COIG-CQIA 1007 0 151.2±77.3, min=39, max=656 general
🔥coig-cqia-logi-qa AI-ModelScope/COIG-CQIA 421 0 309.8±188.8, min=43, max=1306 general
🔥coig-cqia-ruozhiba AI-ModelScope/COIG-CQIA 240 0 189.8±62.2, min=33, max=505 general
🔥coig-cqia-segmentfault AI-ModelScope/COIG-CQIA 458 0 449.0±495.8, min=87, max=6342 general
🔥coig-cqia-wiki AI-ModelScope/COIG-CQIA 10603 0 619.2±515.8, min=73, max=10140 general
🔥coig-cqia-wikihow AI-ModelScope/COIG-CQIA 1485 0 1700.0±790.9, min=260, max=6371 general
🔥coig-cqia-xhs AI-ModelScope/COIG-CQIA 1508 0 438.0±179.6, min=129, max=2191 general
🔥coig-cqia-zhihu AI-ModelScope/COIG-CQIA 5631 0 540.7±306.7, min=161, max=3036 general
🔥ruozhiba-post-annual AI-ModelScope/ruozhiba 1361 0 36.6±15.3, min=24, max=559 pretrain
🔥ruozhiba-title-good AI-ModelScope/ruozhiba 2597 0 41.9±19.3, min=22, max=246 pretrain
🔥ruozhiba-title-norm AI-ModelScope/ruozhiba 81700 0 39.9±12.8, min=21, max=386 pretrain