用deploy部署qwen2vl，多个请求同时并发报错 #1961

zhengzehong · 2024-09-06T08:56:50Z

Describe the bug

使用以下命令部署微调后的模型

CUDA_VISIBLE_DEVICES=1 swift deploy --model_type qwen2-vl-7b-instruct --model_id_or_path /root/ms-swift/train/qwen2-vl-7b-instruct/v1-20240906-145640/checkpoint-66-merged --port 20002

使用以下报文请求，无并发时正常使用

{
"model": "qwen2-vl-7b-instruct",
"messages": [
{
"role": "user",
"content": "解析用户上传的图片内容，以json格式进行输出"
}
],
"temperature": 0,
"images": [
"/root/datas/image/image_1.png"
],
"stream": false
}

当2个请求同时请求时，会报以下错误

错误信息

return self._call_impl(*args, **kwargs)

File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1607, in forward
outputs = self.model(
File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1144, in forward
layer_outputs = decoder_layer(
File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 900, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 798, in forward
query_states, key_states = apply_multimodal_rotary_pos_emb(
File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 183, in apply_multimodal_rotary_pos_emb
sin = sin[position_ids]
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Your hardware and system info

A800 2张卡，使用CUDA_VISIBLE_DEVICES=1指定第二张部署
torch==2.4.0 CUDA=12.5

The text was updated successfully, but these errors were encountered:

Jintao-Huang · 2024-09-06T09:07:25Z

使用了flash attn嘛

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

用deploy部署qwen2vl，多个请求同时并发报错 #1961

用deploy部署qwen2vl，多个请求同时并发报错 #1961

zhengzehong commented Sep 6, 2024

Jintao-Huang commented Sep 6, 2024

用deploy部署qwen2vl，多个请求同时并发报错 #1961

用deploy部署qwen2vl，多个请求同时并发报错 #1961

Comments

zhengzehong commented Sep 6, 2024

使用以下命令部署微调后的模型

使用以下报文请求，无并发时正常使用

当2个请求同时请求时，会报以下错误

错误信息

Jintao-Huang commented Sep 6, 2024