Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compat with vllm==0.5.1 #1329

Merged
merged 1 commit into from
Jul 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 2 additions & 86 deletions docs/source/Multi-Modal/vLLM推理加速文档.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ cd swift
pip install -e '.[llm]'

# vllm与cuda版本有对应关系,请按照`https://docs.vllm.ai/en/latest/getting_started/installation.html`选择版本
pip install "vllm>=0.5"
# vllm在0.5.1版本对多模态有巨大修改, 且只支持1张图片, 这里不进行立即更新, 等vllm稳定后再更新.
pip install "vllm==0.5.0.*"
pip install openai -U
```

Expand Down Expand Up @@ -108,8 +109,6 @@ I'm a language model called Vicuna, and I was trained by researchers from Large

## 部署

### Llava 系列

**服务端:**
```shell
CUDA_VISIBLE_DEVICES=0 swift deploy --model_type llava1_6-vicuna-13b-instruct --infer_backend vllm
Expand Down Expand Up @@ -196,86 +195,3 @@ response: There are two sheep in the picture.
```

更多客户端使用方法可以查看[MLLM部署文档](MLLM部署文档.md#yi-vl-6b-chat)

### phi3-vision

**服务端:**
```shell
# vllm>=0.5.1 or build from source
CUDA_VISIBLE_DEVICES=0 swift deploy --model_type phi3-vision-128k-instruct --infer_backend vllm --max_model_len 8192
```

**客户端:**

测试:
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "phi3-vision-128k-instruct",
"messages": [{"role": "user", "content": "<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img><img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png</img>What is the difference between these two pictures?"}],
"temperature": 0
}'
```

使用openai:
```python
from openai import OpenAI
client = OpenAI(
api_key='EMPTY',
base_url='http://localhost:8000/v1',
)
model_type = client.models.list().data[0].id
print(f'model_type: {model_type}')

# use base64
# import base64
# with open('cat.png', 'rb') as f:
# img_base64 = base64.b64encode(f.read()).decode('utf-8')
# images = [img_base64]

# use local_path
# from swift.llm import convert_to_base64
# images = ['cat.png']
# images = convert_to_base64(images=images)['images']

# use url

query = '<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img><img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png</img>What is the difference between these two pictures?'
messages = [{
'role': 'user',
'content': query
}]
resp = client.chat.completions.create(
model=model_type,
messages=messages,
temperature=0)
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')

# 流式
query = 'How many sheep are in the picture?'
messages = [{
'role': 'user',
'content': query
}]
stream_resp = client.chat.completions.create(
model=model_type,
messages=messages,
stream=True,
temperature=0)

print(f'query: {query}')
print('response: ', end='')
for chunk in stream_resp:
print(chunk.choices[0].delta.content, end='', flush=True)
print()
"""
model_type: phi3-vision-128k-instruct
query: <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img><img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png</img>What is the difference between these two pictures?
response: The first picture shows a group of four sheep standing in a field, while the second picture is a close-up of a kitten with big eyes. The main difference between these two pictures is the subjects and the setting. The first image features animals typically found in a pastoral or rural environment, whereas the second image focuses on a small domestic animal, a kitten, which is usually found indoors. Additionally, the first picture has a more peaceful and serene atmosphere, while the second image has a more intimate and detailed view of the kitten.
query: How many sheep are in the picture?
response: There are three sheep in the picture.
"""
```
88 changes: 2 additions & 86 deletions docs/source_en/Multi-Modal/vllm-inference-acceleration.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ cd swift
pip install -e '.[llm]'

# vllm version corresponds to cuda version, please select version according to `https://docs.vllm.ai/en/latest/getting_started/installation.html`
pip install "vllm>=0.5"
# In version 0.5.1, there have been major changes to multimodal support in VLLM, and it now only supports one image. We will not update immediately and will wait until VLLM is stable before updating.
pip install "vllm==0.5.0.*"
pip install openai -U
```

Expand Down Expand Up @@ -107,8 +108,6 @@ I'm a language model called Vicuna, and I was trained by researchers from Large

## Deployment

### Llava Series

**Server**:
```shell
CUDA_VISIBLE_DEVICES=0 swift deploy --model_type llava1_6-vicuna-13b-instruct --infer_backend vllm
Expand Down Expand Up @@ -195,86 +194,3 @@ response: There are two sheep in the picture.
```

You can check out more client usage methods in the [MLLM Deployment Documentation](mutlimodal-deployment.md#yi-vl-6b-chat).

### phi3-vision

**Server**:
```shell
# vllm>=0.5.1 or build from source
CUDA_VISIBLE_DEVICES=0 swift deploy --model_type phi3-vision-128k-instruct --infer_backend vllm --max_model_len 8192
```

**Client**:

Test:
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "phi3-vision-128k-instruct",
"messages": [{"role": "user", "content": "<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img><img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png</img>What is the difference between these two pictures?"}],
"temperature": 0
}'
```

Using openai:
```python
from openai import OpenAI
client = OpenAI(
api_key='EMPTY',
base_url='http://localhost:8000/v1',
)
model_type = client.models.list().data[0].id
print(f'model_type: {model_type}')

# use base64
# import base64
# with open('cat.png', 'rb') as f:
# img_base64 = base64.b64encode(f.read()).decode('utf-8')
# images = [img_base64]

# use local_path
# from swift.llm import convert_to_base64
# images = ['cat.png']
# images = convert_to_base64(images=images)['images']

# use url

query = '<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img><img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png</img>What is the difference between these two pictures?'
messages = [{
'role': 'user',
'content': query
}]
resp = client.chat.completions.create(
model=model_type,
messages=messages,
temperature=0)
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')

# Streaming
query = 'How many sheep are in the picture?'
messages = [{
'role': 'user',
'content': query
}]
stream_resp = client.chat.completions.create(
model=model_type,
messages=messages,
stream=True,
temperature=0)

print(f'query: {query}')
print('response: ', end='')
for chunk in stream_resp:
print(chunk.choices[0].delta.content, end='', flush=True)
print()
"""
model_type: phi3-vision-128k-instruct
query: <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png</img><img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png</img>What is the difference between these two pictures?
response: The first picture shows a group of four sheep standing in a field, while the second picture is a close-up of a kitten with big eyes. The main difference between these two pictures is the subjects and the setting. The first image features animals typically found in a pastoral or rural environment, whereas the second image focuses on a small domestic animal, a kitten, which is usually found indoors. Additionally, the first picture has a more peaceful and serene atmosphere, while the second image has a more intimate and detailed view of the kitten.
query: How many sheep are in the picture?
response: There are three sheep in the picture.
"""
```
Loading