Merge https://github.com/modelscope/swift into main

hanlv15 · Jul 17, 2024 · 8de6765 · 8de6765
2 parents 4635e2a + df7020b
commit 8de6765
Show file tree

Hide file tree

Showing 29 changed files with 564 additions and 298 deletions.
diff --git a/README.md b/README.md
diff --git a/README_CN.md b/README_CN.md
diff --git a/docs/source/GetStarted/界面训练推理.md b/docs/source/GetStarted/界面训练推理.md
@@ -8,7 +8,7 @@ swift web-ui
 
 开启界面训练和推理。
 
-web-ui没有传入参数，所有可控部分都在界面中。但是有几个环境变量可以使用：
+web-ui可以通过环境变量或者参数控制UI行为。环境变量如下：
 
 > WEBUI_SHARE=1/0 默认为0 控制gradio是否是share状态
 >
@@ -19,3 +19,5 @@ web-ui没有传入参数，所有可控部分都在界面中。但是有几个
 > WEBUI_PORT web-ui的端口号
 >
 > USE_INFERENCE=1/0 默认0. 控制gradio的推理页面是直接加载模型推理或者部署（USE_INFERENCE=0）
+
+如果使用参数，请参考[命令行参数](../LLM/命令行参数.md#web-ui-参数)。
diff --git a/docs/source/LLM/命令行参数.md b/docs/source/LLM/命令行参数.md
@@ -312,7 +312,7 @@ export参数继承了infer参数, 除此之外增加了以下参数:
 - `--to_peft_format`: 默认为`False`. 将lora的swift format（`--tuner_backend swift`）转成peft format.
 - `--merge_lora`: 默认为`False`. 该参数已在InferArguments中定义, 不属于新增参数. 是否将lora权重merge到基模型中, 并保存完整的权重. 权重会保存在`ckpt_dir`的同级目录中, e.g. `'/path/to/your/vx-xxx/checkpoint-xxx-merged'`目录下.
 - `--quant_bits`: 量化的bits数. 默认为`0`, 即不进行量化. 如果你设置了`--quant_method awq`, 你可以设置为`4`进行4bits量化. 如果你设置了`--quant_method gptq`, 你可以设置为`2`,`3`,`4`,`8`进行对应bits的量化. 如果对原始模型进行量化, 权重会保存在`f'{args.model_type}-{args.quant_method}-int{args.quant_bits}'`目录中. 如果对微调后模型进行量化, 权重会保存在`ckpt_dir`的同级目录中, e.g. `f'/path/to/your/vx-xxx/checkpoint-xxx-{args.quant_method}-int{args.quant_bits}'`目录下.
-- `--quant_method`: 量化方法, 默认为`'awq'`. 你可以选择为'awq', 'gptq'.
+- `--quant_method`: 量化方法, 默认为`'awq'`. 你可以选择为'awq', 'gptq', 'bnb'.
 - `--dataset`: 该参数已在InferArguments中定义, 在export时含义为量化数据集. 默认为`[]`. 更多细节: 包括如何自定义量化数据集, 可以参考[LLM量化文档](LLM量化文档.md).
 - `--quant_n_samples`: 量化参数, 默认为`256`. 当设置为`--quant_method awq`时, 如果出现量化的时候OOM, 可以适度降低`--quant_n_samples`和`--quant_seqlen`. `--quant_method gptq`通常不会出现量化OOM.
 - `--quant_seqlen`: 量化参数, 默认为`2048`.
@@ -323,6 +323,8 @@ export参数继承了infer参数, 除此之外增加了以下参数:
 - `--hub_token`: 默认为`None`. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
 - `--hub_private_repo`: 默认为`False`. 具体的参数介绍可以在`sft.sh命令行参数`中查看.
 - `--commit_message`: 默认是`'update files'`.
+- `--to_ollama`: 转为ollama导出.
+- `--ollama_output_dir`: ollama输出目录. 默认存储在当前目录下的`模型类型-ollama`文件夹内.
 
 ## eval参数
 
@@ -364,3 +366,10 @@ deploy参数继承了infer参数, 除此之外增加了以下参数:
 - `--port`: 默认为`8000`.
 - `--ssl_keyfile`: 默认为`None`.
 - `--ssl_certfile`: 默认为`None`.
+
+## web-ui 参数
+
+- `--host`: 默认为`'127.0.0.1'`.
+- `--port`: 默认为`None`.
+- `--lang`: 默认为`'zh'`.
+- `--share`: 默认为`False`.
diff --git a/docs/source/LLM/支持的模型和数据集.md b/docs/source/LLM/支持的模型和数据集.md
@@ -110,9 +110,9 @@
 |chatglm3-6b-32k|[ZhipuAI/chatglm3-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-32k/summary)|query_key_value|chatglm3|&#x2718;|&#x2714;|transformers<4.42|-|[THUDM/chatglm3-6b-32k](https://huggingface.co/THUDM/chatglm3-6b-32k)|
 |chatglm3-6b-128k|[ZhipuAI/chatglm3-6b-128k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-128k/summary)|query_key_value|chatglm3|&#x2718;|&#x2714;|transformers<4.42|-|[THUDM/chatglm3-6b-128k](https://huggingface.co/THUDM/chatglm3-6b-128k)|
 |codegeex2-6b|[ZhipuAI/codegeex2-6b](https://modelscope.cn/models/ZhipuAI/codegeex2-6b/summary)|query_key_value|chatglm-generation|&#x2718;|&#x2714;|transformers<4.34|coding|[THUDM/codegeex2-6b](https://huggingface.co/THUDM/codegeex2-6b)|
-|glm4-9b|[ZhipuAI/glm-4-9b](https://modelscope.cn/models/ZhipuAI/glm-4-9b/summary)|query_key_value|chatglm-generation|&#x2714;|&#x2714;|transformers<4.42|-|[THUDM/glm-4-9b](https://huggingface.co/THUDM/glm-4-9b)|
-|glm4-9b-chat|[ZhipuAI/glm-4-9b-chat](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat/summary)|query_key_value|chatglm3|&#x2714;|&#x2714;|transformers<4.42|-|[THUDM/glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat)|
-|glm4-9b-chat-1m|[ZhipuAI/glm-4-9b-chat-1m](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m/summary)|query_key_value|chatglm3|&#x2714;|&#x2714;|transformers<4.42|-|[THUDM/glm-4-9b-chat-1m](https://huggingface.co/THUDM/glm-4-9b-chat-1m)|
+|glm4-9b|[ZhipuAI/glm-4-9b](https://modelscope.cn/models/ZhipuAI/glm-4-9b/summary)|query_key_value|chatglm-generation|&#x2714;|&#x2714;||-|[THUDM/glm-4-9b](https://huggingface.co/THUDM/glm-4-9b)|
+|glm4-9b-chat|[ZhipuAI/glm-4-9b-chat](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat/summary)|query_key_value|chatglm3|&#x2714;|&#x2714;||-|[THUDM/glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat)|
+|glm4-9b-chat-1m|[ZhipuAI/glm-4-9b-chat-1m](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m/summary)|query_key_value|chatglm3|&#x2714;|&#x2714;||-|[THUDM/glm-4-9b-chat-1m](https://huggingface.co/THUDM/glm-4-9b-chat-1m)|
 |codegeex4-9b-chat|[ZhipuAI/codegeex4-all-9b](https://modelscope.cn/models/ZhipuAI/codegeex4-all-9b/summary)|query_key_value|codegeex4|&#x2714;|&#x2714;|transformers<4.42|coding|[THUDM/codegeex4-all-9b](https://huggingface.co/THUDM/codegeex4-all-9b)|
 |llama2-7b|[modelscope/Llama-2-7b-ms](https://modelscope.cn/models/modelscope/Llama-2-7b-ms/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)|
 |llama2-7b-chat|[modelscope/Llama-2-7b-chat-ms](https://modelscope.cn/models/modelscope/Llama-2-7b-chat-ms/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)|
@@ -213,6 +213,7 @@
 |deepseek-math-7b|[deepseek-ai/deepseek-math-7b-base](https://modelscope.cn/models/deepseek-ai/deepseek-math-7b-base/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||math|[deepseek-ai/deepseek-math-7b-base](https://huggingface.co/deepseek-ai/deepseek-math-7b-base)|
 |deepseek-math-7b-instruct|[deepseek-ai/deepseek-math-7b-instruct](https://modelscope.cn/models/deepseek-ai/deepseek-math-7b-instruct/summary)|q_proj, k_proj, v_proj|deepseek|&#x2714;|&#x2714;||math|[deepseek-ai/deepseek-math-7b-instruct](https://huggingface.co/deepseek-ai/deepseek-math-7b-instruct)|
 |deepseek-math-7b-chat|[deepseek-ai/deepseek-math-7b-rl](https://modelscope.cn/models/deepseek-ai/deepseek-math-7b-rl/summary)|q_proj, k_proj, v_proj|deepseek|&#x2714;|&#x2714;||math|[deepseek-ai/deepseek-math-7b-rl](https://huggingface.co/deepseek-ai/deepseek-math-7b-rl)|
+|numina-math-7b|[AI-ModelScope/NuminaMath-7B-TIR](https://modelscope.cn/models/AI-ModelScope/NuminaMath-7B-TIR/summary)|q_proj, k_proj, v_proj|numina-math|&#x2714;|&#x2714;||math|[AI-MO/NuminaMath-7B-TIR](https://huggingface.co/AI-MO/NuminaMath-7B-TIR)|
 |deepseek-v2-chat|[deepseek-ai/DeepSeek-V2-Chat](https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Chat/summary)|q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj|deepseek2|&#x2714;|&#x2714;|transformers>=4.39.3|-|[deepseek-ai/DeepSeek-V2-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat)|
 |deepseek-v2-lite|[deepseek-ai/DeepSeek-V2-Lite](https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Lite/summary)|q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.39.3|-|[deepseek-ai/DeepSeek-V2-Lite](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite)|
 |deepseek-v2-lite-chat|[deepseek-ai/DeepSeek-V2-Lite-Chat](https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Lite-Chat/summary)|q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj|deepseek2|&#x2714;|&#x2714;|transformers>=4.39.3|-|[deepseek-ai/DeepSeek-V2-Lite-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite-Chat)|
@@ -330,7 +331,7 @@
 |qwen-vl-chat-int4|[qwen/Qwen-VL-Chat-Int4](https://modelscope.cn/models/qwen/Qwen-VL-Chat-Int4/summary)|c_attn|qwen|&#x2714;|&#x2718;|auto_gptq>=0.5|vision|[Qwen/Qwen-VL-Chat-Int4](https://huggingface.co/Qwen/Qwen-VL-Chat-Int4)|
 |qwen-audio|[qwen/Qwen-Audio](https://modelscope.cn/models/qwen/Qwen-Audio/summary)|c_attn|qwen-audio-generation|&#x2714;|&#x2718;||audio|[Qwen/Qwen-Audio](https://huggingface.co/Qwen/Qwen-Audio)|
 |qwen-audio-chat|[qwen/Qwen-Audio-Chat](https://modelscope.cn/models/qwen/Qwen-Audio-Chat/summary)|c_attn|qwen-audio|&#x2714;|&#x2718;||audio|[Qwen/Qwen-Audio-Chat](https://huggingface.co/Qwen/Qwen-Audio-Chat)|
-|glm4v-9b-chat|[ZhipuAI/glm-4v-9b](https://modelscope.cn/models/ZhipuAI/glm-4v-9b/summary)|self_attention.query_key_value|glm4v|&#x2718;|&#x2718;|transformers<4.42|vision|[THUDM/glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b)|
+|glm4v-9b-chat|[ZhipuAI/glm-4v-9b](https://modelscope.cn/models/ZhipuAI/glm-4v-9b/summary)|self_attention.query_key_value|glm4v|&#x2718;|&#x2718;||vision|[THUDM/glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b)|
 |llava1_5-7b-instruct|[swift/llava-1.5-7b-hf](https://modelscope.cn/models/swift/llava-1.5-7b-hf/summary)|q_proj, k_proj, v_proj|llava1_5|&#x2714;|&#x2714;|transformers>=4.36|vision|[llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf)|
 |llava1_5-13b-instruct|[swift/llava-1.5-13b-hf](https://modelscope.cn/models/swift/llava-1.5-13b-hf/summary)|q_proj, k_proj, v_proj|llava1_5|&#x2714;|&#x2714;|transformers>=4.36|vision|[llava-hf/llava-1.5-13b-hf](https://huggingface.co/llava-hf/llava-1.5-13b-hf)|
 |llava1_6-mistral-7b-instruct|[swift/llava-v1.6-mistral-7b-hf](https://modelscope.cn/models/swift/llava-v1.6-mistral-7b-hf/summary)|q_proj, k_proj, v_proj|llava-mistral|&#x2714;|&#x2714;|transformers>=4.39|vision|[llava-hf/llava-v1.6-mistral-7b-hf](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf)|
@@ -354,7 +355,7 @@
 |mini-internvl-chat-2b-v1_5|[OpenGVLab/Mini-InternVL-Chat-2B-V1-5](https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-2B-V1-5/summary)|wqkv|internvl|&#x2714;|&#x2718;|transformers>=4.35, timm|vision|[OpenGVLab/Mini-InternVL-Chat-2B-V1-5](https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-2B-V1-5)|
 |mini-internvl-chat-4b-v1_5|[OpenGVLab/Mini-InternVL-Chat-4B-V1-5](https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-4B-V1-5/summary)|qkv_proj|internvl-phi3|&#x2714;|&#x2718;|transformers>=4.35, timm|vision|[OpenGVLab/Mini-InternVL-Chat-4B-V1-5](https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-4B-V1-5)|
 |internvl2-2b|[OpenGVLab/InternVL2-2B](https://modelscope.cn/models/OpenGVLab/InternVL2-2B/summary)|wqkv|internvl2|&#x2714;|&#x2718;|transformers>=4.35, timm|vision|[OpenGVLab/InternVL2-2B](https://huggingface.co/OpenGVLab/InternVL2-2B)|
-|internvl2-4b|[OpenGVLab/InternVL2-4B](https://modelscope.cn/models/OpenGVLab/InternVL2-4B/summary)|wqkv|internvl2|&#x2714;|&#x2718;|transformers>=4.35, timm|vision|[OpenGVLab/InternVL2-4B](https://huggingface.co/OpenGVLab/InternVL2-4B)|
+|internvl2-4b|[OpenGVLab/InternVL2-4B](https://modelscope.cn/models/OpenGVLab/InternVL2-4B/summary)|qkv_proj|internvl2-phi3|&#x2714;|&#x2718;|transformers>=4.35, timm|vision|[OpenGVLab/InternVL2-4B](https://huggingface.co/OpenGVLab/InternVL2-4B)|
 |internvl2-8b|[OpenGVLab/InternVL2-8B](https://modelscope.cn/models/OpenGVLab/InternVL2-8B/summary)|wqkv|internvl2|&#x2714;|&#x2718;|transformers>=4.35, timm|vision|[OpenGVLab/InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B)|
 |internvl2-26b|[OpenGVLab/InternVL2-26B](https://modelscope.cn/models/OpenGVLab/InternVL2-26B/summary)|wqkv|internvl2|&#x2714;|&#x2718;|transformers>=4.35, timm|vision|[OpenGVLab/InternVL2-26B](https://huggingface.co/OpenGVLab/InternVL2-26B)|
 |internvl2-40b|[OpenGVLab/InternVL2-40B](https://modelscope.cn/models/OpenGVLab/InternVL2-40B/summary)|q_proj, k_proj, v_proj|internvl2|&#x2714;|&#x2718;|transformers>=4.35, timm|vision|[OpenGVLab/InternVL2-40B](https://huggingface.co/OpenGVLab/InternVL2-40B)|

diff --git a/docs/source_en/GetStarted/Web-ui.md b/docs/source_en/GetStarted/Web-ui.md
@@ -8,7 +8,7 @@ swift web-ui
 
 This command starts the interface for training and inference.
 
-The web-ui command doesn't accept parameters; all controllable parts are handled within the interface. However, there are a few environment variables that can be used:
+The web-ui command has two ways to control the UI behaviour. The first way is to use environment variable:
 
 > WEBUI_SHARE=1/0: Default is 0. Controls whether gradio is in share mode.
 >
@@ -19,3 +19,5 @@ The web-ui command doesn't accept parameters; all controllable parts are handled
 > WEBUI_PORT: The port number for web-ui.
 >
 > USE_INFERENCE=1/0: Default is 0. Controls whether the gradio inference page directly loads the model for inference or deployment (USE_INFERENCE=0).
+
+The second way is to use [parameter](../LLM/Command-line-parameters.md#web-ui-parameters).
diff --git a/docs/source_en/LLM/Command-line-parameters.md b/docs/source_en/LLM/Command-line-parameters.md
@@ -312,7 +312,7 @@ export parameters inherit from infer parameters, with the following added parame
 - `--to_peft_format`: Default is `False`. Convert the swift format of LoRA (`--tuner_backend swift`) to peft format.
 - `--merge_lora`: Default is `False`. This parameter is already defined in InferArguments, not a new parameter. Whether to merge lora weights into base model and save full weights. Weights will be saved in the same level directory as `ckpt_dir`, e.g. `'/path/to/your/vx-xxx/checkpoint-xxx-merged'` directory.
 - `--quant_bits`: Number of bits for quantization. Default is `0`, i.e. no quantization. If you set `--quant_method awq`, you can set this to `4` for 4bits quantization. If you set `--quant_method gptq`, you can set this to `2`,`3`,`4`,`8` for corresponding bits quantization. If quantizing original model, weights will be saved in `f'{args.model_type}-{args.quant_method}-int{args.quant_bits}'` directory. If quantizing fine-tuned model, weights will be saved in the same level directory as `ckpt_dir`, e.g. `f'/path/to/your/vx-xxx/checkpoint-xxx-{args.quant_method}-int{args.quant_bits}'` directory.
-- `--quant_method`: Quantization method, default is `'awq'`. Options are 'awq', 'gptq'.
+- `--quant_method`: Quantization method, default is `'awq'`. Options are 'awq', 'gptq', 'bnb'.
 - `--dataset`: This parameter is already defined in InferArguments, for export it means quantization dataset. Default is `[]`. More details: including how to customize quantization dataset, can be found in [LLM Quantization Documentation](LLM-quantization.md).
 - `--quant_n_samples`: Quantization parameter, default is `256`. When set to `--quant_method awq`, if OOM occurs during quantization, you can moderately reduce `--quant_n_samples` and `--quant_seqlen`. `--quant_method gptq` generally does not encounter quantization OOM.
 - `--quant_seqlen`: Quantization parameter, default is `2048`.
@@ -323,6 +323,8 @@ export parameters inherit from infer parameters, with the following added parame
 - `--hub_token`: Default is `None`. See `sft.sh command line arguments` for parameter details.
 - `--hub_private_repo`: Default is `False`. See `sft.sh command line arguments` for parameter details.
 - `--commit_message`: Default is `'update files'`.
+- `--to_ollama`: Export to ollama modelfile.
+- `--ollama_output_dir`: ollama output dir. Default is `<modeltype>-ollama`.
 
 ## eval parameters
 
@@ -363,3 +365,10 @@ deploy parameters inherit from infer parameters, with the following added parame
 - `--port`: Default is `8000`.
 - `--ssl_keyfile`: Default is `None`.
 - `--ssl_certfile`: Default is `None`.
+
+## web-ui Parameters
+
+- `--host`: Default `'127.0.0.1'`.
+- `--port`: Default `None`.
+- `--lang`: Default `'zh'`.
+- `--share`: Default `False`.