Skip to content

Commit

Permalink
support LLava-Next(Stronger) model (modelscope#933)
Browse files Browse the repository at this point in the history
  • Loading branch information
hjh0119 committed May 16, 2024
1 parent 9cff868 commit c6c1cdf
Show file tree
Hide file tree
Showing 8 changed files with 119 additions and 16 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ Additionally, we are expanding capabilities for other modalities. Currently, we
SWIFT has rich documentations for users, please check [here](https://github.com/modelscope/swift/tree/main/docs/source_en/LLM).

## 🎉 News
- 2024.05.16: Supports Llava-Next (Stronger) series models. For best practice, you can refer to [here](https://github.com/modelscope/swift/tree/main/docs/source_en/Multi-Modal/llava-best-practice.md).
- 🔥2024.05.13: Support Yi-1.5 series models,use `--model_type yi-1_5-9b-chat` to begin!
- 2024.05.11: Support for qlora training and quantized inference using [hqq](https://github.com/mobiusml/hqq) and [eetq](https://github.com/NetEase-FuXi/EETQ). For more information, see the [LLM Quantization Documentation](https://github.com/modelscope/swift/tree/main/docs/source_en/LLM/LLM-quantization.md).
- 2024.05.10: Support split a sequence to multiple GPUs to reduce memory usage. Use this feature by `pip install .[seq_parallel]`, then add `--sequence_parallel_size n` to your DDP script to begin!
Expand Down Expand Up @@ -514,6 +515,7 @@ The complete list of supported models and datasets can be found at [Supported Mo
| MiniCPM-V | [OpenBmB MiniCPM vision model](https://github.com/OpenBMB/MiniCPM) | Chinese<br>English | 3B | chat model |
| CogVLM<br>CogAgent | [Zhipu ChatGLM visual QA and Agent model](https://github.com/THUDM/) | English | 17B-18B | chat model |
| Llava | [Llava series models](https://github.com/haotian-liu/LLaVA) | English | 7B-34B | chat model |
| Llava-Next | [Llava-Next series models](https://github.com/LLaVA-VL/LLaVA-NeXT) | Chinese<br>English | 8B-110B | chat model |
| mPLUG-Owl | [mPLUG-Owl series models](https://github.com/X-PLUG/mPLUG-Owl) | English | 11B | chat model |
| InternVL | [InternVL](https://github.com/OpenGVLab/InternVL) | Chinese<br>English | 25.5B<br>including quantized version | chat model |
| Llava-llama3 | [xtuner](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers) | English | 8B | chat model |
Expand Down
2 changes: 2 additions & 0 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ SWIFT支持近**200种LLM和MLLM**(多模态大模型)的训练、推理、
SWIFT具有丰富的文档体系,如有使用问题请请查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM).

## 🎉 新闻
- 2024.05.16: 支持Llava-Next (Stronger)系列模型,最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/llava最佳实践.md).
- 🔥2024.05.13: 支持Yi-1.5系列模型,使用`--model_type yi-1_5-9b-chat`等开始体验
- 2024.05.11: 支持使用[hqq](https://github.com/mobiusml/hqq)[eetq](https://github.com/NetEase-FuXi/EETQ)进行qlora训练和量化推理,可以查看[LLM量化文档](https://github.com/modelscope/swift/tree/main/docs/source/LLM/LLM量化文档.md)
- 2024.05.10: 支持序列并行. 先安装`pip install .[seq_parallel]`, 之后在DDP环境中添加`--sequence_parallel_size n`即可使用!
Expand Down Expand Up @@ -514,6 +515,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
| MiniCPM-V | [OpenBmB MiniCPM视觉模型](https://github.com/OpenBMB/MiniCPM) | 中文<br>英文 | 3B | chat模型 |
| CogVLM<br>CogAgent | [智谱ChatGLM视觉问答和Agent模型](https://github.com/THUDM/) | 英文 | 17B-18B | chat模型 |
| Llava | [Llava系列模型](https://github.com/haotian-liu/LLaVA) | 英文 | 7B-34B | chat模型 |
| Llava-Next | [Llava-Next系列模型](https://github.com/LLaVA-VL/LLaVA-NeXT) | 中文<br>英文 | 8B-110B | chat模型 |
| mPLUG-Owl | [mPLUG-Owl系列模型](https://github.com/X-PLUG/mPLUG-Owl) | 英文 | 11B | chat模型 |
| InternVL | [InternVL](https://github.com/OpenGVLab/InternVL) | 中文<br>英文 | 25.5B<br>包含量化版本 | chat模型 |
| Llava-llama3 | [xtuner](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers) | 英文 | 8B | chat model |
Expand Down
3 changes: 3 additions & 0 deletions docs/source/LLM/支持的模型和数据集.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,9 @@
|atom-7b-chat|[FlagAlpha/Atom-7B-Chat](https://modelscope.cn/models/FlagAlpha/Atom-7B-Chat/summary)|q_proj, k_proj, v_proj|atom|&#x2714;|&#x2714;||-|[FlagAlpha/Atom-7B-Chat](https://huggingface.co/FlagAlpha/Atom-7B-Chat)|
|llava1d6-mistral-7b-instruct|[AI-ModelScope/llava-v1.6-mistral-7b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary)|q_proj, k_proj, v_proj|llava-mistral-instruct|&#x2714;|&#x2718;|transformers>=4.34|multi-modal, vision|[liuhaotian/llava-v1.6-mistral-7b](https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b)|
|llava1d6-yi-34b-instruct|[AI-ModelScope/llava-v1.6-34b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary)|q_proj, k_proj, v_proj|llava-yi-instruct|&#x2714;|&#x2718;||multi-modal, vision|[liuhaotian/llava-v1.6-34b](https://huggingface.co/liuhaotian/llava-v1.6-34b)|
|llama3-llava-next-8b|[AI-Modelscope/llama3-llava-next-8b](https://modelscope.cn/models/AI-Modelscope/llama3-llava-next-8b/summary)|q_proj, k_proj, v_proj|llama-llava-next|&#x2714;|&#x2718;||multi-modal, vision|[lmms-lab/llama3-llava-next-8b](https://huggingface.co/lmms-lab/llama3-llava-next-8b)|
|llava-next-72b|[AI-Modelscope/llava-next-72b](https://modelscope.cn/models/AI-Modelscope/llava-next-72b/summary)|q_proj, k_proj, v_proj|llava-qwen-instruct|&#x2714;|&#x2718;||multi-modal, vision|[lmms-lab/llava-next-72b](https://huggingface.co/lmms-lab/llava-next-72b)|
|llava-next-110b|[AI-Modelscope/llava-next-110b](https://modelscope.cn/models/AI-Modelscope/llava-next-110b/summary)|q_proj, k_proj, v_proj|llava-qwen-instruct|&#x2714;|&#x2718;||multi-modal, vision|[lmms-lab/llava-next-110b](https://huggingface.co/lmms-lab/llava-next-110b)|
|yi-6b|[01ai/Yi-6B](https://modelscope.cn/models/01ai/Yi-6B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B)|
|yi-6b-200k|[01ai/Yi-6B-200K](https://modelscope.cn/models/01ai/Yi-6B-200K/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[01-ai/Yi-6B-200K](https://huggingface.co/01-ai/Yi-6B-200K)|
|yi-6b-chat|[01ai/Yi-6B-Chat](https://modelscope.cn/models/01ai/Yi-6B-Chat/summary)|q_proj, k_proj, v_proj|yi|&#x2714;|&#x2714;||-|[01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)|
Expand Down
21 changes: 15 additions & 6 deletions docs/source/Multi-Modal/llava最佳实践.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@

# Llava 最佳实践
本篇文档对应的模型

| model | model_type |
|-------|------------|
| [llava-v1.6-mistral-7b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary) | llava1d6-mistral-7b-instruct |
| [llava-v1.6-34b](https://www.modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary) | llava1d6-yi-34b-instruct |
|[llama3-llava-next-8b](https://modelscope.cn/models/AI-ModelScope/llama3-llava-next-8b/summary)|llama3-llava-next-8b|
|[llava-next-72b](https://modelscope.cn/models/AI-ModelScope/llava-next-72b/summary)|llava-next-72b|
|[llava-next-110b](https://modelscope.cn/models/AI-ModelScope/llava-next-110b/summary)|llava-next-110b|

以下实践以`llava-v1.6-mistral-7b`为例,你也可以通过指定`--model_type`切换为其他模型

## 目录
- [环境准备](#环境准备)
Expand All @@ -16,10 +27,8 @@ pip install -e '.[llm]'
```

## 推理

推理[llava1d6-mistral-7b-instruct](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary)[llava1d6-yi-34b-instruct](https://www.modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary):
```shell
# Experimental environment: A10, 3090, V100...
# Experimental environment: A100
# 20GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift infer --model_type llava1d6-mistral-7b-instruct

Expand Down Expand Up @@ -110,7 +119,7 @@ from swift.llm import (
from swift.utils import seed_everything
import torch

model_type = ModelType.llava1d6_mistral_7b_instruct # ModelType.llava1d6_yi_34b_instruct
model_type = 'llava1d6-mistral-7b-instruct'
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

Expand Down Expand Up @@ -208,7 +217,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 swift sft \
## 微调后推理
直接推理:
```shell
model_type="llava1d6-mistral-7b-instruct" # "llava1d6-yi-34b-instruct"
model_type="llava1d6-mistral-7b-instruct"

CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/${model_type}/vx-xxx/checkpoint-xxx \
Expand All @@ -217,7 +226,7 @@ CUDA_VISIBLE_DEVICES=0 swift infer \

**merge-lora**并推理:
```shell
model_type="llava1d6-mistral-7b-instruct" # "llava1d6-yi-34b-instruct"
model_type="llava1d6-mistral-7b-instruct"
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir "output/${model_type}/vx-xxx/checkpoint-xxx" \
--merge_lora true
Expand Down
3 changes: 3 additions & 0 deletions docs/source_en/LLM/Supported-models-datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,9 @@ The table below introcudes all models supported by SWIFT:
|atom-7b-chat|[FlagAlpha/Atom-7B-Chat](https://modelscope.cn/models/FlagAlpha/Atom-7B-Chat/summary)|q_proj, k_proj, v_proj|atom|&#x2714;|&#x2714;||-|[FlagAlpha/Atom-7B-Chat](https://huggingface.co/FlagAlpha/Atom-7B-Chat)|
|llava1d6-mistral-7b-instruct|[AI-ModelScope/llava-v1.6-mistral-7b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary)|q_proj, k_proj, v_proj|llava-mistral-instruct|&#x2714;|&#x2718;|transformers>=4.34|multi-modal, vision|[liuhaotian/llava-v1.6-mistral-7b](https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b)|
|llava1d6-yi-34b-instruct|[AI-ModelScope/llava-v1.6-34b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary)|q_proj, k_proj, v_proj|llava-yi-instruct|&#x2714;|&#x2718;||multi-modal, vision|[liuhaotian/llava-v1.6-34b](https://huggingface.co/liuhaotian/llava-v1.6-34b)|
|llama3-llava-next-8b|[AI-Modelscope/llama3-llava-next-8b](https://modelscope.cn/models/AI-Modelscope/llama3-llava-next-8b/summary)|q_proj, k_proj, v_proj|llama-llava-next|&#x2714;|&#x2718;||multi-modal, vision|[lmms-lab/llama3-llava-next-8b](https://huggingface.co/lmms-lab/llama3-llava-next-8b)|
|llava-next-72b|[AI-Modelscope/llava-next-72b](https://modelscope.cn/models/AI-Modelscope/llava-next-72b/summary)|q_proj, k_proj, v_proj|llava-qwen-instruct|&#x2714;|&#x2718;||multi-modal, vision|[lmms-lab/llava-next-72b](https://huggingface.co/lmms-lab/llava-next-72b)|
|llava-next-110b|[AI-Modelscope/llava-next-110b](https://modelscope.cn/models/AI-Modelscope/llava-next-110b/summary)|q_proj, k_proj, v_proj|llava-qwen-instruct|&#x2714;|&#x2718;||multi-modal, vision|[lmms-lab/llava-next-110b](https://huggingface.co/lmms-lab/llava-next-110b)|
|yi-6b|[01ai/Yi-6B](https://modelscope.cn/models/01ai/Yi-6B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B)|
|yi-6b-200k|[01ai/Yi-6B-200K](https://modelscope.cn/models/01ai/Yi-6B-200K/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[01-ai/Yi-6B-200K](https://huggingface.co/01-ai/Yi-6B-200K)|
|yi-6b-chat|[01ai/Yi-6B-Chat](https://modelscope.cn/models/01ai/Yi-6B-Chat/summary)|q_proj, k_proj, v_proj|yi|&#x2714;|&#x2714;||-|[01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)|
Expand Down
22 changes: 16 additions & 6 deletions docs/source_en/Multi-Modal/llava-best-practice.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,16 @@
# Llava Best Practices
The document corresponds to the following models

| model | model_type |
|-------|------------|
| [llava-v1.6-mistral-7b](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary) | llava1d6-mistral-7b-instruct |
| [llava-v1.6-34b](https://www.modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary) | llava1d6-yi-34b-instruct |
|[llama3-llava-next-8b](https://modelscope.cn/models/AI-ModelScope/llama3-llava-next-8b/summary)|llama3-llava-next-8b|
|[llava-next-72b](https://modelscope.cn/models/AI-ModelScope/llava-next-72b/summary)|llava-next-72b|
|[llava-next-110b](https://modelscope.cn/models/AI-ModelScope/llava-next-110b/summary)|llava-next-110b|

The following practices take `llava-v1.6-mistral-7b` as an example. You can also switch to other models by specifying the `--model_type`.


## Table of Contents
- [Environment Setup](#environment-setup)
Expand All @@ -14,10 +26,8 @@ pip install -e '.[llm]'
```

## Inference

Inference for [llava1d6-mistral-7b-instruct](https://modelscope.cn/models/AI-ModelScope/llava-v1.6-mistral-7b/summary) and [llava1d6-yi-34b-instruct](https://www.modelscope.cn/models/AI-ModelScope/llava-v1.6-34b/summary):
```shell
# Experimental environment: A10, 3090, V100...
# Experimental environment: A100
# 20GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift infer --model_type llava1d6-mistral-7b-instruct

Expand Down Expand Up @@ -108,7 +118,7 @@ from swift.llm import (
from swift.utils import seed_everything
import torch

model_type = ModelType.llava1d6_mistral_7b_instruct # ModelType.llava1d6_yi_34b_instruct
model_type = 'llava1d6-mistral-7b-instruct'
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

Expand Down Expand Up @@ -205,15 +215,15 @@ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 swift sft \
## Inference after Fine-tuning
Direct inference:
```shell
model_type="llava1d6-mistral-7b-instruct" # "llava1d6-yi-34b-instruct"
model_type="llava1d6-mistral-7b-instruct"
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/${model_type}/vx-xxx/checkpoint-xxx \
--load_dataset_config true
```

**merge-lora** and inference:
```shell
model_type="llava1d6-mistral-7b-instruct" # "llava1d6-yi-34b-instruct"
model_type="llava1d6-mistral-7b-instruct"
CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir "output/${model_type}/vx-xxx/checkpoint-xxx" \
--merge_lora true
Expand Down
46 changes: 42 additions & 4 deletions swift/llm/utils/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,9 @@ class ModelType:
# llava
llava1d6_mistral_7b_instruct = 'llava1d6-mistral-7b-instruct'
llava1d6_yi_34b_instruct = 'llava1d6-yi-34b-instruct'
llama3_llava_next_8b = 'llama3-llava-next-8b'
llava_next_72b = 'llava-next-72b'
llava_next_110b = 'llava-next-110b'
# yi
yi_6b = 'yi-6b'
yi_6b_200k = 'yi-6b-200k'
Expand Down Expand Up @@ -3910,23 +3913,53 @@ def _new_generate(inputs=None, *args, **kwargs):
function_kwargs={'llm_model_type': 'mistral'},
tags=['multi-modal', 'vision'],
hf_model_id='liuhaotian/llava-v1.6-mistral-7b')
@register_model(
ModelType.llama3_llava_next_8b,
'AI-Modelscope/llama3-llava-next-8b',
LoRATM.llama2,
TemplateType.llama_llava_next,
support_flash_attn=True,
tags=['multi-modal', 'vision'],
function_kwargs={'llm_model_type': 'next_llama'},
hf_model_id='lmms-lab/llama3-llava-next-8b')
@register_model(
ModelType.llava_next_72b,
'AI-Modelscope/llava-next-72b',
LoRATM.llama2,
TemplateType.llava_qwen_instruct,
support_flash_attn=True,
tags=['multi-modal', 'vision'],
function_kwargs={'llm_model_type': 'next_qwen'},
hf_model_id='lmms-lab/llava-next-72b')
@register_model(
ModelType.llava_next_110b,
'AI-Modelscope/llava-next-110b',
LoRATM.llama2,
TemplateType.llava_qwen_instruct,
support_flash_attn=True,
tags=['multi-modal', 'vision'],
function_kwargs={'llm_model_type': 'next_qwen'},
hf_model_id='lmms-lab/llava-next-110b')
def get_model_tokenizer_llava(model_dir: str,
torch_dtype: Dtype,
model_kwargs: Dict[str, Any],
load_model: bool = True,
**kwargs):
llm_model_type = kwargs.pop('llm_model_type')
if 'local_repo_path' in kwargs:
local_repo_path = kwargs['local_repo_path']
repo_path = kwargs['local_repo_path']
elif 'next' in llm_model_type:
repo_path = 'https://github.com/LLaVA-VL/LLaVA-NeXT.git'
else:
local_repo_path = _git_clone_github('https://github.com/haotian-liu/LLaVA.git')
repo_path = 'https://github.com/haotian-liu/LLaVA.git'
local_repo_path = _git_clone_github(repo_path)
sys.path.append(os.path.join(local_repo_path))

llm_model_type = kwargs.pop('llm_model_type')
if llm_model_type == 'mistral':
from llava.model import LlavaMistralForCausalLM, LlavaMistralConfig
model_config = LlavaMistralConfig.from_pretrained(model_dir)
automodel_class = LlavaMistralForCausalLM
else: # llama
elif 'llama' in llm_model_type: # llama
from llava.model import LlavaLlamaForCausalLM, LlavaConfig
if not hasattr(LlavaLlamaForCausalLM, '__old_forward'): # Avoid double patching
forward = LlavaLlamaForCausalLM.forward
Expand All @@ -3940,6 +3973,11 @@ def _new_forward(*args, **kwargs):
LlavaLlamaForCausalLM.forward = _new_forward
model_config = LlavaConfig.from_pretrained(model_dir)
automodel_class = LlavaLlamaForCausalLM
else: # qwen
from llava.model import LlavaQwenForCausalLM
automodel_class = LlavaQwenForCausalLM
model_config = AutoConfig.from_pretrained(model_dir)

model_config.mm_vision_tower = snapshot_download('AI-ModelScope/clip-vit-large-patch14-336')
model, tokenizer = get_model_tokenizer_with_flash_attn(
model_dir,
Expand Down
36 changes: 36 additions & 0 deletions swift/llm/utils/template.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ class TemplateType:
llava_mistral_instruct = 'llava-mistral-instruct'
llava_yi_instruct = 'llava-yi-instruct'
llava_llama_instruct = 'llava-llama-instruct'
llava_qwen_instruct = 'llava-qwen-instruct'
llama_llava_next = 'llama-llava-next'
openbuddy = 'openbuddy'
openbuddy2 = 'openbuddy2'
internlm = 'internlm'
Expand Down Expand Up @@ -1060,6 +1062,40 @@ def data_collator(self, batch: List[Dict[str, Any]], padding_to: Optional[int] =
lazy_tokenize=True)


class LlamaLlavaNextTemplate(LLavaTemplate):
default_system = 'You are a helpful language and vision assistant. ' \
'You are able to understand the visual content that the user provides, ' \
'and assist the user with a variety of tasks using natural language.'

def __init__(self):
Template.__init__(self, [], [
'<|start_header_id|>user<|end_header_id|>\n\n', [-200],
'\n{{QUERY}}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n'
], ['<|eot_id|>'], ['<|eot_id|>'], self.default_system,
['<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{{SYSTEM}}'])


register_template(
TemplateType.llama_llava_next,
LlamaLlavaNextTemplate(),
use_model=True,
infer_media_type='round',
lazy_tokenize=True)


class LLavaQwenTemplate(LLavaTemplate):
llavayi_query_template = 'You are a helpful assistant'

def __init__(self):
Template.__init__(self, [], ['<|im_start|>user\n', [-200], '{{QUERY}}<|im_end|>\n<|im_start|>assistant\n'],
['<|im_end|>\n'], ['<|im_end|>'], self.llavayi_query_template,
['<|im_start|>system\n{{SYSTEM}}<|im_end|>\n'])


register_template(
TemplateType.llava_qwen_instruct, LLavaQwenTemplate(), use_model=True, infer_media_type='round', lazy_tokenize=True)


def _findall(token_list: List[int], token: int) -> List[int]:
"""Find the index of a token in the token_list."""
res = []
Expand Down

0 comments on commit c6c1cdf

Please sign in to comment.