Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

用deploy部署qwen2vl,多个请求同时并发报错 #1961

Open
zhengzehong opened this issue Sep 6, 2024 · 1 comment
Open

用deploy部署qwen2vl,多个请求同时并发报错 #1961

zhengzehong opened this issue Sep 6, 2024 · 1 comment

Comments

@zhengzehong
Copy link

Describe the bug

使用以下命令部署微调后的模型

CUDA_VISIBLE_DEVICES=1 swift deploy --model_type qwen2-vl-7b-instruct --model_id_or_path /root/ms-swift/train/qwen2-vl-7b-instruct/v1-20240906-145640/checkpoint-66-merged --port 20002

使用以下报文请求,无并发时正常使用

{
"model": "qwen2-vl-7b-instruct",
"messages": [
{
"role": "user",
"content": "解析用户上传的图片内容,以json格式进行输出"
}
],
"temperature": 0,
"images": [
"/root/datas/image/image_1.png"
],
"stream": false
}

当2个请求同时请求时,会报以下错误

错误信息

return self._call_impl(*args, **kwargs)

File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1607, in forward
outputs = self.model(
File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1144, in forward
layer_outputs = decoder_layer(
File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 900, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 798, in forward
query_states, key_states = apply_multimodal_rotary_pos_emb(
File "/root/transformers/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 183, in apply_multimodal_rotary_pos_emb
sin = sin[position_ids]
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Your hardware and system info

A800 2张卡,使用CUDA_VISIBLE_DEVICES=1指定第二张部署
torch==2.4.0 CUDA=12.5

@Jintao-Huang
Copy link
Collaborator

使用了flash attn嘛

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants