diff --git a/README.md b/README.md index 9654f70ff..e97107e01 100644 --- a/README.md +++ b/README.md @@ -47,6 +47,7 @@ SWIFT has rich documentations for users, please check [here](https://github.com/ SWIFT web-ui is available both on [Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) and [ModelScope studio](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary), please feel free to try! ## 🎉 News +- 2024.07.02: Support for `llava1_6-vicuna-7b-chat`, `llava1_6-vicuna-13b-chat` and other llava-hf models. For best practices, refer to [here](docs/source_en/Multi-Modal/llava-best-practice.md). - 🔥2024.06.29: Support [eval-scope](https://github.com/modelscope/eval-scope)&[open-compass](https://github.com/open-compass/opencompass) for evaluation! Now we have supported over 50 eval datasets like `BoolQ, ocnli, humaneval, math, ceval, mmlu, gsk8k, ARC_e`, please check our [Eval Doc](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/LLM-eval.md) to begin! Next sprint we will support Multi-modal and Agent evaluation, remember to follow us : ) - 🔥2024.06.28: Support for **Florence** series model! See [document](docs/source_en/Multi-Modal/florence-best-pratice.md) - 🔥2024.06.28: Support for Gemma2 series models: gemma2-9b, gemma2-9b-instruct, gemma2-27b, gemma2-27b-instruct. diff --git a/README_CN.md b/README_CN.md index f7ba67219..09c481108 100644 --- a/README_CN.md +++ b/README_CN.md @@ -48,6 +48,7 @@ SWIFT具有丰富的文档体系,如有使用问题请请查看[这里](https: 可以在[Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) 和 [ModelScope创空间](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary) 中体验SWIFT web-ui功能了。 ## 🎉 新闻 +- 2024.07.02: 支持`llava1_6-vicuna-7b-chat`, `llava1_6-vicuna-13b-chat`等llava-hf模型. 最佳实践可以查看[这里](docs/source/Multi-Modal/llava最佳实践.md). - 🔥2024.06.29: 支持[eval-scope](https://github.com/modelscope/eval-scope)&[open-compass](https://github.com/open-compass/opencompass)评测! 我们支持了包含`BoolQ, ocnli, humaneval, math, ceval, mmlu, gsk8k, ARC_e`等50+标准数据集在内的评测流程, 请查看我们的[评测文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM评测文档.md)来使用。下个迭代我们会支持多模态评测和Agent评测,记得持续关注我们: ) - 🔥2024.06.28: 支持**Florence**系列模型: 可以查看[Florence最佳实践](docs/source/Multi-Modal/florence最佳实践.md). - 🔥2024.06.28: 支持**Gemma2**系列模型: gemma2-9b, gemma2-9b-instruct, gemma2-27b, gemma2-27b-instruct. diff --git "a/docs/source/LLM/\346\224\257\346\214\201\347\232\204\346\250\241\345\236\213\345\222\214\346\225\260\346\215\256\351\233\206.md" "b/docs/source/LLM/\346\224\257\346\214\201\347\232\204\346\250\241\345\236\213\345\222\214\346\225\260\346\215\256\351\233\206.md" index db7bb8260..e8ab9de00 100644 --- "a/docs/source/LLM/\346\224\257\346\214\201\347\232\204\346\250\241\345\236\213\345\222\214\346\225\260\346\215\256\351\233\206.md" +++ "b/docs/source/LLM/\346\224\257\346\214\201\347\232\204\346\250\241\345\236\213\345\222\214\346\225\260\346\215\256\351\233\206.md" @@ -238,6 +238,7 @@ |mistral-7b-v2|[AI-ModelScope/Mistral-7B-v0.2-hf](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.2-hf/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.34|-|[alpindale/Mistral-7B-v0.2-hf](https://huggingface.co/alpindale/Mistral-7B-v0.2-hf)| |mistral-7b-instruct|[AI-ModelScope/Mistral-7B-Instruct-v0.1](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.1/summary)|q_proj, k_proj, v_proj|llama|✔|✔|transformers>=4.34|-|[mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)| |mistral-7b-instruct-v2|[AI-ModelScope/Mistral-7B-Instruct-v0.2](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.2/summary)|q_proj, k_proj, v_proj|llama|✔|✔|transformers>=4.34|-|[mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)| +|mistral-7b-instruct-v3|[LLM-Research/Mistral-7B-Instruct-v0.3](https://modelscope.cn/models/LLM-Research/Mistral-7B-Instruct-v0.3/summary)|q_proj, k_proj, v_proj|llama|✔|✔|transformers>=4.34|-|[mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)| |mixtral-moe-7b|[AI-ModelScope/Mixtral-8x7B-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-v0.1/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.36|-|[mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)| |mixtral-moe-7b-instruct|[AI-ModelScope/Mixtral-8x7B-Instruct-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-Instruct-v0.1/summary)|q_proj, k_proj, v_proj|llama|✔|✔|transformers>=4.36|-|[mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)| |mixtral-moe-7b-aqlm-2bit-1x16|[AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf/summary)|q_proj, k_proj, v_proj|default-generation|✔|✘|transformers>=4.38, aqlm, torch>=2.2.0|-|[ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf](https://huggingface.co/ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf)| @@ -327,8 +328,11 @@ |qwen-audio-chat|[qwen/Qwen-Audio-Chat](https://modelscope.cn/models/qwen/Qwen-Audio-Chat/summary)|c_attn|qwen-audio|✔|✘||audio|[Qwen/Qwen-Audio-Chat](https://huggingface.co/Qwen/Qwen-Audio-Chat)| |glm4v-9b-chat|[ZhipuAI/glm-4v-9b](https://modelscope.cn/models/ZhipuAI/glm-4v-9b/summary)|self_attention.query_key_value|glm4v|✘|✘||vision|[THUDM/glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b)| |llava1_5-7b-chat|[huangjintao/llava-1.5-7b-hf](https://modelscope.cn/models/huangjintao/llava-1.5-7b-hf/summary)|q_proj, k_proj, v_proj|llava1_5|✔|✘|transformers>=4.36|vision|[llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf)| -|llava1_6-mistral-7b-instruct|[AI-ModelScope/llava-v1.6-mistral-7b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary)|q_proj, k_proj, v_proj|llava-mistral-instruct|✔|✘|transformers>=4.34|vision|[liuhaotian/llava-v1.6-mistral-7b](https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b)| -|llava1_6-yi-34b-instruct|[AI-ModelScope/llava-v1.6-34b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary)|q_proj, k_proj, v_proj|llava-yi-instruct|✔|✘||vision|[liuhaotian/llava-v1.6-34b](https://huggingface.co/liuhaotian/llava-v1.6-34b)| +|llava1_5-13b-chat|[huangjintao/llava-1.5-13b-hf](https://modelscope.cn/models/huangjintao/llava-1.5-13b-hf/summary)|q_proj, k_proj, v_proj|llava1_5|✔|✘|transformers>=4.36|vision|[llava-hf/llava-1.5-13b-hf](https://huggingface.co/llava-hf/llava-1.5-13b-hf)| +|llava1_6-mistral-7b-chat|[huangjintao/llava-v1.6-mistral-7b-hf](https://modelscope.cn/models/huangjintao/llava-v1.6-mistral-7b-hf/summary)|q_proj, k_proj, v_proj|llava-mistral|✔|✘|transformers>=4.36|vision|[llava-hf/llava-v1.6-mistral-7b-hf](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf)| +|llava1_6-vicuna-7b-chat|[huangjintao/llava-v1.6-vicuna-7b-hf](https://modelscope.cn/models/huangjintao/llava-v1.6-vicuna-7b-hf/summary)|q_proj, k_proj, v_proj|llava-vicuna|✔|✘|transformers>=4.36|vision|[llava-hf/llava-v1.6-vicuna-7b-hf](https://huggingface.co/llava-hf/llava-v1.6-vicuna-7b-hf)| +|llava1_6-vicuna-13b-chat|[huangjintao/llava-v1.6-vicuna-13b-hf](https://modelscope.cn/models/huangjintao/llava-v1.6-vicuna-13b-hf/summary)|q_proj, k_proj, v_proj|llava-vicuna|✔|✘|transformers>=4.36|vision|[llava-hf/llava-v1.6-vicuna-13b-hf](https://huggingface.co/llava-hf/llava-v1.6-vicuna-13b-hf)| +|llava1_6-yi-34b-chat|[huangjintao/llava-v1.6-34b-hf](https://modelscope.cn/models/huangjintao/llava-v1.6-34b-hf/summary)|q_proj, k_proj, v_proj|llava-yi|✔|✘|transformers>=4.36|vision|[llava-hf/llava-v1.6-34b-hf](https://huggingface.co/llava-hf/llava-v1.6-34b-hf)| |llama3-llava-next-8b|[AI-Modelscope/llama3-llava-next-8b](https://modelscope.cn/models/AI-Modelscope/llama3-llava-next-8b/summary)|q_proj, k_proj, v_proj|llama-llava-next|✔|✘||vision|[lmms-lab/llama3-llava-next-8b](https://huggingface.co/lmms-lab/llama3-llava-next-8b)| |llava-next-72b|[AI-Modelscope/llava-next-72b](https://modelscope.cn/models/AI-Modelscope/llava-next-72b/summary)|q_proj, k_proj, v_proj|llava-qwen-instruct|✔|✘||vision|[lmms-lab/llava-next-72b](https://huggingface.co/lmms-lab/llava-next-72b)| |llava-next-110b|[AI-Modelscope/llava-next-110b](https://modelscope.cn/models/AI-Modelscope/llava-next-110b/summary)|q_proj, k_proj, v_proj|llava-qwen-instruct|✔|✘||vision|[lmms-lab/llava-next-110b](https://huggingface.co/lmms-lab/llava-next-110b)| diff --git "a/docs/source/Multi-Modal/llava\346\234\200\344\275\263\345\256\236\350\267\265.md" "b/docs/source/Multi-Modal/llava\346\234\200\344\275\263\345\256\236\350\267\265.md" index 9e5e9b7f0..ce400e2a6 100644 --- "a/docs/source/Multi-Modal/llava\346\234\200\344\275\263\345\256\236\350\267\265.md" +++ "b/docs/source/Multi-Modal/llava\346\234\200\344\275\263\345\256\236\350\267\265.md" @@ -1,16 +1,17 @@ - # Llava 最佳实践 -本篇文档对应的模型 +本篇文档涉及的模型如下: + +- [llava1_5-7b-chat](https://modelscope.cn/models/huangjintao/llava-1.5-7b-hf) +- [llava1_5-13b-chat](https://modelscope.cn/models/huangjintao/llava-1.5-13b-hf) +- [llava1_6-mistral-7b-chat](https://modelscope.cn/models/huangjintao/llava-v1.6-mistral-7b-hf) +- [llava1_6-vicuna-7b-chat](https://modelscope.cn/models/huangjintao/llava-v1.6-vicuna-7b-hf) +- [llava1_6-vicuna-13b-chat](https://modelscope.cn/models/huangjintao/llava-v1.6-vicuna-13b-hf) +- [llava1_6-yi-34b-chat](https://modelscope.cn/models/huangjintao/llava-v1.6-34b-hf) +- [llava-next-72b](https://modelscope.cn/models/AI-Modelscope/llava-next-72b) +- [llava-next-110b](https://modelscope.cn/models/AI-Modelscope/llava-next-110b) -| model | model_type | -|-------|------------| -| [llava-v1.6-mistral-7b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary) | llava1_6-mistral-7b-instruct | -| [llava-v1.6-34b](https://www.modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary) | llava1_6-yi-34b-instruct | -|[llama3-llava-next-8b](https://modelscope.cn/models/AI-ModelScope/llama3-llava-next-8b/summary)|llama3-llava-next-8b| -|[llava-next-72b](https://modelscope.cn/models/AI-ModelScope/llava-next-72b/summary)|llava-next-72b| -|[llava-next-110b](https://modelscope.cn/models/AI-ModelScope/llava-next-110b/summary)|llava-next-110b| -以下实践以`llava-v1.6-mistral-7b`为例,你也可以通过指定`--model_type`切换为其他模型 +以下实践以`llava1_6-mistral-7b-chat`为例,你也可以通过指定`--model_type`切换为其他模型. ## 目录 - [环境准备](#环境准备) @@ -30,13 +31,13 @@ pip install -e '.[llm]' ```shell # Experimental environment: A100 # 20GB GPU memory -CUDA_VISIBLE_DEVICES=0 swift infer --model_type llava1_6-mistral-7b-instruct +CUDA_VISIBLE_DEVICES=0 swift infer --model_type llava1_6-mistral-7b-chat # 70GB GPU memory -CUDA_VISIBLE_DEVICES=0 swift infer --model_type llava1_6-yi-34b-instruct +CUDA_VISIBLE_DEVICES=0 swift infer --model_type llava1_6-yi-34b-chat # 4*20GB GPU memory -CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer --model_type llava1_6-yi-34b-instruct +CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer --model_type llava1_6-yi-34b-chat ``` 输出: (支持传入本地路径或URL) @@ -54,9 +55,10 @@ The image shows a close-up of a kitten with a soft, blurred background that sugg Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png There are four sheep in the picture. -------------------------------------------------- +<<< clear <<< What is the calculation result? Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png -The calculation result is 14352 + 45304 = 145304. +The calculation result is 1452 + 453004 = 453006. -------------------------------------------------- <<< Write a poem based on the content of the picture. Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png @@ -141,7 +143,7 @@ from swift.llm import ( from swift.utils import seed_everything import torch -model_type = 'llava1_6-mistral-7b-instruct' +model_type = 'llava1_6-mistral-7b-chat' template_type = get_default_template_type(model_type) print(f'template_type: {template_type}') @@ -198,13 +200,13 @@ LoRA微调: # Experimental environment: A10, 3090, V100... # 21GB GPU memory CUDA_VISIBLE_DEVICES=0 swift sft \ - --model_type llava1_6-mistral-7b-instruct \ + --model_type llava1_6-mistral-7b-chat\ --dataset coco-en-2-mini \ # Experimental environment: 2*A100... # 2*45GB GPU memory CUDA_VISIBLE_DEVICES=0,1 swift sft \ - --model_type llava1_6-yi-34b-instruct \ + --model_type llava1_6-yi-34b-chat \ --dataset coco-en-2-mini \ ``` @@ -213,14 +215,14 @@ CUDA_VISIBLE_DEVICES=0,1 swift sft \ # Experimental environment: 4 * A100 # 4 * 70 GPU memory NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \ - --model_type llava1_6-mistral-7b-instruct \ + --model_type llava1_6-mistral-7b-chat\ --dataset coco-en-2-mini \ --sft_type full \ --deepspeed default-zero2 # 8 * 50 GPU memory CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 swift sft \ - --model_type llava1_6-yi-34b-instruct \ + --model_type llava1_6-yi-34b-chat \ --dataset coco-en-2-mini \ --sft_type full \ ``` @@ -239,7 +241,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 swift sft \ ## 微调后推理 直接推理: ```shell -model_type="llava1_6-mistral-7b-instruct" +model_type="llava1_6-mistral-7b-chat" CUDA_VISIBLE_DEVICES=0 swift infer \ --ckpt_dir output/${model_type}/vx-xxx/checkpoint-xxx \ @@ -248,7 +250,8 @@ CUDA_VISIBLE_DEVICES=0 swift infer \ **merge-lora**并推理: ```shell -model_type="llava1_6-mistral-7b-instruct" +model_type="llava1_6-mistral-7b-chat" + CUDA_VISIBLE_DEVICES=0 swift export \ --ckpt_dir "output/${model_type}/vx-xxx/checkpoint-xxx" \ --merge_lora true diff --git a/docs/source_en/LLM/Supported-models-datasets.md b/docs/source_en/LLM/Supported-models-datasets.md index e8c4f5a9c..d2e6f57f6 100644 --- a/docs/source_en/LLM/Supported-models-datasets.md +++ b/docs/source_en/LLM/Supported-models-datasets.md @@ -238,6 +238,7 @@ The table below introcudes all models supported by SWIFT: |mistral-7b-v2|[AI-ModelScope/Mistral-7B-v0.2-hf](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-v0.2-hf/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.34|-|[alpindale/Mistral-7B-v0.2-hf](https://huggingface.co/alpindale/Mistral-7B-v0.2-hf)| |mistral-7b-instruct|[AI-ModelScope/Mistral-7B-Instruct-v0.1](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.1/summary)|q_proj, k_proj, v_proj|llama|✔|✔|transformers>=4.34|-|[mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)| |mistral-7b-instruct-v2|[AI-ModelScope/Mistral-7B-Instruct-v0.2](https://modelscope.cn/models/AI-ModelScope/Mistral-7B-Instruct-v0.2/summary)|q_proj, k_proj, v_proj|llama|✔|✔|transformers>=4.34|-|[mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)| +|mistral-7b-instruct-v3|[LLM-Research/Mistral-7B-Instruct-v0.3](https://modelscope.cn/models/LLM-Research/Mistral-7B-Instruct-v0.3/summary)|q_proj, k_proj, v_proj|llama|✔|✔|transformers>=4.34|-|[mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)| |mixtral-moe-7b|[AI-ModelScope/Mixtral-8x7B-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-v0.1/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.36|-|[mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)| |mixtral-moe-7b-instruct|[AI-ModelScope/Mixtral-8x7B-Instruct-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-Instruct-v0.1/summary)|q_proj, k_proj, v_proj|llama|✔|✔|transformers>=4.36|-|[mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)| |mixtral-moe-7b-aqlm-2bit-1x16|[AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf/summary)|q_proj, k_proj, v_proj|default-generation|✔|✘|transformers>=4.38, aqlm, torch>=2.2.0|-|[ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf](https://huggingface.co/ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf)| @@ -327,8 +328,11 @@ The table below introcudes all models supported by SWIFT: |qwen-audio-chat|[qwen/Qwen-Audio-Chat](https://modelscope.cn/models/qwen/Qwen-Audio-Chat/summary)|c_attn|qwen-audio|✔|✘||audio|[Qwen/Qwen-Audio-Chat](https://huggingface.co/Qwen/Qwen-Audio-Chat)| |glm4v-9b-chat|[ZhipuAI/glm-4v-9b](https://modelscope.cn/models/ZhipuAI/glm-4v-9b/summary)|self_attention.query_key_value|glm4v|✘|✘||vision|[THUDM/glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b)| |llava1_5-7b-chat|[huangjintao/llava-1.5-7b-hf](https://modelscope.cn/models/huangjintao/llava-1.5-7b-hf/summary)|q_proj, k_proj, v_proj|llava1_5|✔|✘|transformers>=4.36|vision|[llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf)| -|llava1_6-mistral-7b-instruct|[AI-ModelScope/llava-v1.6-mistral-7b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary)|q_proj, k_proj, v_proj|llava-mistral-instruct|✔|✘|transformers>=4.34|vision|[liuhaotian/llava-v1.6-mistral-7b](https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b)| -|llava1_6-yi-34b-instruct|[AI-ModelScope/llava-v1.6-34b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary)|q_proj, k_proj, v_proj|llava-yi-instruct|✔|✘||vision|[liuhaotian/llava-v1.6-34b](https://huggingface.co/liuhaotian/llava-v1.6-34b)| +|llava1_5-13b-chat|[huangjintao/llava-1.5-13b-hf](https://modelscope.cn/models/huangjintao/llava-1.5-13b-hf/summary)|q_proj, k_proj, v_proj|llava1_5|✔|✘|transformers>=4.36|vision|[llava-hf/llava-1.5-13b-hf](https://huggingface.co/llava-hf/llava-1.5-13b-hf)| +|llava1_6-mistral-7b-chat|[huangjintao/llava-v1.6-mistral-7b-hf](https://modelscope.cn/models/huangjintao/llava-v1.6-mistral-7b-hf/summary)|q_proj, k_proj, v_proj|llava-mistral|✔|✘|transformers>=4.36|vision|[llava-hf/llava-v1.6-mistral-7b-hf](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf)| +|llava1_6-vicuna-7b-chat|[huangjintao/llava-v1.6-vicuna-7b-hf](https://modelscope.cn/models/huangjintao/llava-v1.6-vicuna-7b-hf/summary)|q_proj, k_proj, v_proj|llava-vicuna|✔|✘|transformers>=4.36|vision|[llava-hf/llava-v1.6-vicuna-7b-hf](https://huggingface.co/llava-hf/llava-v1.6-vicuna-7b-hf)| +|llava1_6-vicuna-13b-chat|[huangjintao/llava-v1.6-vicuna-13b-hf](https://modelscope.cn/models/huangjintao/llava-v1.6-vicuna-13b-hf/summary)|q_proj, k_proj, v_proj|llava-vicuna|✔|✘|transformers>=4.36|vision|[llava-hf/llava-v1.6-vicuna-13b-hf](https://huggingface.co/llava-hf/llava-v1.6-vicuna-13b-hf)| +|llava1_6-yi-34b-chat|[huangjintao/llava-v1.6-34b-hf](https://modelscope.cn/models/huangjintao/llava-v1.6-34b-hf/summary)|q_proj, k_proj, v_proj|llava-yi|✔|✘|transformers>=4.36|vision|[llava-hf/llava-v1.6-34b-hf](https://huggingface.co/llava-hf/llava-v1.6-34b-hf)| |llama3-llava-next-8b|[AI-Modelscope/llama3-llava-next-8b](https://modelscope.cn/models/AI-Modelscope/llama3-llava-next-8b/summary)|q_proj, k_proj, v_proj|llama-llava-next|✔|✘||vision|[lmms-lab/llama3-llava-next-8b](https://huggingface.co/lmms-lab/llama3-llava-next-8b)| |llava-next-72b|[AI-Modelscope/llava-next-72b](https://modelscope.cn/models/AI-Modelscope/llava-next-72b/summary)|q_proj, k_proj, v_proj|llava-qwen-instruct|✔|✘||vision|[lmms-lab/llava-next-72b](https://huggingface.co/lmms-lab/llava-next-72b)| |llava-next-110b|[AI-Modelscope/llava-next-110b](https://modelscope.cn/models/AI-Modelscope/llava-next-110b/summary)|q_proj, k_proj, v_proj|llava-qwen-instruct|✔|✘||vision|[lmms-lab/llava-next-110b](https://huggingface.co/lmms-lab/llava-next-110b)| diff --git a/docs/source_en/Multi-Modal/llava-best-practice.md b/docs/source_en/Multi-Modal/llava-best-practice.md index 8620ed355..b20520845 100644 --- a/docs/source_en/Multi-Modal/llava-best-practice.md +++ b/docs/source_en/Multi-Modal/llava-best-practice.md @@ -1,15 +1,16 @@ # Llava Best Practice -The document corresponds to the following models +The document corresponds to the following models: -| model | model_type | -|-------|------------| -| [llava-v1.6-mistral-7b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary) | llava1_6-mistral-7b-instruct | -| [llava-v1.6-34b](https://www.modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary) | llava1_6-yi-34b-instruct | -|[llama3-llava-next-8b](https://modelscope.cn/models/AI-ModelScope/llama3-llava-next-8b/summary)|llama3-llava-next-8b| -|[llava-next-72b](https://modelscope.cn/models/AI-ModelScope/llava-next-72b/summary)|llava-next-72b| -|[llava-next-110b](https://modelscope.cn/models/AI-ModelScope/llava-next-110b/summary)|llava-next-110b| +- [llava1_5-7b-chat](https://modelscope.cn/models/huangjintao/llava-1.5-7b-hf) +- [llava1_5-13b-chat](https://modelscope.cn/models/huangjintao/llava-1.5-13b-hf) +- [llava1_6-mistral-7b-chat](https://modelscope.cn/models/huangjintao/llava-v1.6-mistral-7b-hf) +- [llava1_6-vicuna-7b-chat](https://modelscope.cn/models/huangjintao/llava-v1.6-vicuna-7b-hf) +- [llava1_6-vicuna-13b-chat](https://modelscope.cn/models/huangjintao/llava-v1.6-vicuna-13b-hf) +- [llava1_6-yi-34b-chat](https://modelscope.cn/models/huangjintao/llava-v1.6-34b-hf) +- [llava-next-72b](https://modelscope.cn/models/AI-Modelscope/llava-next-72b) +- [llava-next-110b](https://modelscope.cn/models/AI-Modelscope/llava-next-110b) -The following practices take `llava-v1.6-mistral-7b` as an example. You can also switch to other models by specifying the `--model_type`. +The following practice takes `llava1_6-mistral-7b-chat` as an example, and you can also switch to other models by specifying `--model_type`. ## Table of Contents @@ -29,13 +30,13 @@ pip install -e '.[llm]' ```shell # Experimental environment: A100 # 20GB GPU memory -CUDA_VISIBLE_DEVICES=0 swift infer --model_type llava1_6-mistral-7b-instruct +CUDA_VISIBLE_DEVICES=0 swift infer --model_type llava1_6-mistral-7b-chat # 70GB GPU memory -CUDA_VISIBLE_DEVICES=0 swift infer --model_type llava1_6-yi-34b-instruct +CUDA_VISIBLE_DEVICES=0 swift infer --model_type llava1_6-yi-34b-chat # 4*20GB GPU memory -CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer --model_type llava1_6-yi-34b-instruct +CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer --model_type llava1_6-yi-34b-chat ``` Output: (supports passing in local path or URL) @@ -49,9 +50,10 @@ The image shows a close-up of a kitten with a soft, blurred background that sugg Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png There are four sheep in the picture. -------------------------------------------------- +<<< clear <<< What is the calculation result? Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png -The calculation result is 14352 + 45304 = 145304. +The calculation result is 1452 + 453004 = 453006. -------------------------------------------------- <<< Write a poem based on the content of the picture. Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png @@ -84,6 +86,20 @@ The boat, a symbol of solitude, In the vast expanse of the universe's beauty, A lone journey, a solitary quest, In the quiet of the night, it finds its rest. +-------------------------------------------------- +<<< Perform OCR on the image. +Input a media path or URL <<< https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/ocr_en.png +The text in the image is as follows: + +INTRODUCTION + +SWIFT supports training, inference, evaluation and deployment of 250+ LLMs (multimodal large models). Developers can directly apply our framework to their own research and production environments to realize the complete workflow from model training and evaluation to application. In addition, SWIFT provides a complete Adapters library to support the latest training techniques such as NLP, Vision, etc. This adapter library can be used directly in your own custom workflow without our training scripts. + +To facilitate use by users unfamiliar with deep learning, we provide a Grado web-ui for controlling training and inference, as well as accompanying deep learning courses and best practices for beginners. + +SWIFT has rich documentation for users, please check here. + +SWIFT is web-ui available both on Huggingface space and ModelScope studio, please feel free to try! """ ``` @@ -118,7 +134,7 @@ from swift.llm import ( from swift.utils import seed_everything import torch -model_type = 'llava1_6-mistral-7b-instruct' +model_type = 'llava1_6-mistral-7b-chat' template_type = get_default_template_type(model_type) print(f'template_type: {template_type}') @@ -175,12 +191,12 @@ LoRA fine-tuning: # Experimental environment: A10, 3090, V100... # 21GB GPU memory CUDA_VISIBLE_DEVICES=0 swift sft \ - --model_type llava1_6-mistral-7b-instruct \ + --model_type llava1_6-mistral-7b-chat \ --dataset coco-en-2-mini \ # 2*45GB GPU memory CUDA_VISIBLE_DEVICES=0,1 swift sft \ - --model_type llava1_6-yi-34b-instruct \ + --model_type llava1_6-yi-34b-chat \ --dataset coco-en-2-mini \ ``` @@ -189,14 +205,14 @@ Full parameter fine-tuning: # Experimental environment: 4 * A100 # 4 * 70 GPU memory NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \ - --model_type llava1_6-mistral-7b-instruct \ + --model_type llava1_6-mistral-7b-chat \ --dataset coco-en-2-mini \ --sft_type full \ --deepspeed default-zero2 # 8 * 50 GPU memory CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 swift sft \ - --model_type llava1_6-yi-34b-instruct \ + --model_type llava1_6-yi-34b-chat \ --dataset coco-en-2-mini \ --sft_type full \ ``` @@ -215,7 +231,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 swift sft \ ## Inference after Fine-tuning Direct inference: ```shell -model_type="llava1_6-mistral-7b-instruct" +model_type="llava1_6-mistral-7b-chat" CUDA_VISIBLE_DEVICES=0 swift infer \ --ckpt_dir output/${model_type}/vx-xxx/checkpoint-xxx \ --load_dataset_config true @@ -223,7 +239,7 @@ CUDA_VISIBLE_DEVICES=0 swift infer \ **merge-lora** and inference: ```shell -model_type="llava1_6-mistral-7b-instruct" +model_type="llava1_6-mistral-7b-chat" CUDA_VISIBLE_DEVICES=0 swift export \ --ckpt_dir "output/${model_type}/vx-xxx/checkpoint-xxx" \ --merge_lora true diff --git a/swift/llm/app_ui.py b/swift/llm/app_ui.py index f3627b59e..1b5844b7d 100644 --- a/swift/llm/app_ui.py +++ b/swift/llm/app_ui.py @@ -76,10 +76,11 @@ def model_chat(query: str, history: History) -> Iterator[Tuple[str, History]]: gr.Markdown(f'