Skip to content

Latest commit

 

History

History
177 lines (137 loc) · 7.37 KB

mplug-owl2最佳实践.md

File metadata and controls

177 lines (137 loc) · 7.37 KB

mPLUG-Owl2 最佳实践

以下内容以mplug-owl2_1-chat为例, 你也可以选择mplug-owl2-chat.

目录

环境准备

git clone https://github.com/modelscope/swift.git
cd swift
pip install -e '.[llm]'

模型链接:

推理

推理mplug-owl2_1-chat:

# Experimental environment: A10, 3090, V100...
# 24GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift infer --model_type mplug-owl2_1-chat

输出: (支持传入本地路径或URL)

"""
<<< Describe this image.
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
The image features a close-up of a cute, gray and white kitten with big blue eyes. The kitten is sitting on a table, looking directly at the viewer. The scene captures the kitten's adorable features, including its whiskers and the fur on its face. The kitten appears to be staring into the camera, creating a captivating and endearing atmosphere.
--------------------------------------------------
<<< How many sheep are in the picture?
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png
There are four sheep in the picture.
--------------------------------------------------
<<< What is the calculation result?
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/math.png
The calculation result is 1452 + 45304 = 46756.
--------------------------------------------------
<<< Write a poem based on the content of the picture.
Input a media path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/poem.png
In the stillness of the night, a boat glides across the water, its light shining bright. The stars twinkle above, casting a magical glow. A man and a dog are on board, enjoying the serene journey. The boat floats gently, as if it's floating on air. The calm waters reflect the stars, creating a breathtaking scene. The man and his dog are lost in their thoughts, taking in the beauty of nature. The boat seems to be floating in a dream, as if they are on a journey to find their way back home.
--------------------------------------------------
<<< clear
<<< Perform OCR on the image.
Input a media path or URL <<< https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/ocr_en.png
Text: Swift support training, inference and deployment of 250+ LLMs and 350+ MLMs (multimodal models). Developers can directly apply framework their own research and production environments to realize a complete workflow from model training and evaluation to application. In addition to supporting the lightweight training models provided by PEFT, we also provide a Complete Adapters library that can be adapted to various models such as NeTune, LoRaT, LLMA-PRO, etc. This adapter library can be used directly in your own custom workflow. The library is user-friendly with unfamiliar deep learning, Gradio UI for controlling training and inference, as well as accompanying learning courses and best practices for beginners. Additionally, we provide extra training and Lora LRN for AnimateDiff. Swift has rich documents for users on Huggingface and ModelScope, so please feel free to try it!
"""

示例图片如下:

cat:

animal:

math:

poem:

ocr_en:

单样本推理

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_model_tokenizer, get_template, inference, ModelType,
    get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch

model_type = ModelType.mplug_owl2_1_chat
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

model, tokenizer = get_model_tokenizer(model_type, torch.float16,
                                       model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
query = 'How far is it from each city?'
response, history = inference(model, template, query, images=images)
print(f'query: {query}')
print(f'response: {response}')

# 流式
query = 'Which city is the farthest?'
images = images * 2
gen = inference_stream(model, template, query, history, images=images)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
    delta = response[print_idx:]
    print(delta, end='', flush=True)
    print_idx = len(response)
print()
print(f'history: {history}')
"""
query: How far is it from each city?
response: From the given information, it is 14 km from the city of Mata, 62 km from Yangjiang, and 293 km from Guangzhou.
query: Which city is the farthest?
response: The farthest city is Guangzhou, which is 293 km away.
history: [['How far is it from each city?', 'From the given information, it is 14 km from the city of Mata, 62 km from Yangjiang, and 293 km from Guangzhou.'], ['Which city is the farthest?', 'The farthest city is Guangzhou, which is 293 km away.']]
"""

示例图片如下:

road:

微调

多模态大模型微调通常使用自定义数据集进行微调. 这里展示可直接运行的demo:

# Experimental environment: A10, 3090, V100...
# 24GB GPU memory
CUDA_VISIBLE_DEVICES=0 swift sft \
    --model_type mplug-owl2_1-chat \
    --dataset coco-en-2-mini \

自定义数据集支持json, jsonl样式, 以下是自定义数据集的例子:

(支持多轮对话, 每轮对话必须包含一张图片, 支持传入本地路径或URL)

{"query": "55555", "response": "66666", "images": ["image_path"]}
{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response1"], ["query2", "response2"]], "images": ["image_path", "image_path2", "image_path3"]}

微调后推理

直接推理:

CUDA_VISIBLE_DEVICES=0 swift infer \
    --ckpt_dir output/mplug-owl2_1-chat/vx-xxx/checkpoint-xxx \
    --load_dataset_config true \

merge-lora并推理:

CUDA_VISIBLE_DEVICES=0 swift export \
    --ckpt_dir output/mplug-owl2_1-chat/vx-xxx/checkpoint-xxx \
    --merge_lora true

CUDA_VISIBLE_DEVICES=0 swift infer \
    --ckpt_dir output/mplug-owl2_1-chat/vx-xxx/checkpoint-xxx-merged \
    --load_dataset_config true