Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V100显卡 Ubuntu22.04系统 qwen2-vl-2b模型, 单卡测试脚本运行正常,双卡,三卡,四卡运行异常。 #2087

Digital2Slave opened this issue Sep 20, 2024 · 2 comments


Copy link

参考,在 四个16G V100 显卡主机上,搭建环境,测试单样本推理脚本时发现,仅单卡时可以正常运行。双卡,三卡和四卡时运行异常。


$ mkvirtualenv aivl -p /usr/bin/python3.10
(aivl) $ pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url
(aivl) $ git clone
(aivl) $ cd ms-swift
(aivl) $ pip install -e .[llm]
(aivl) $ pip install git+
(aivl) $ pip install pyav qwen_vl_utils

# qwen2-vl 
(aivl) $ pip install git+
# vllm加速
(aivl) $ pip install vllm>=0.6.1


import os
#!< 调整环境变量CUDA_VISIBLE_DEVICES,分别为0; 0,1; 0,1,2; 0,1,2,3
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

#!< --------------修改地方----------------
os.environ['SIZE_FACTOR'] = '8'
os.environ['MAX_PIXELS'] = '602112'
# ---------------------------------------

from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
from swift.utils import seed_everything
import torch

model_type = ModelType.qwen2_vl_2b_instruct
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

#!< --------------------------修改地方,torch.float16-------------------------
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
                                       model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)

query = """<img></img>距离各城市多远?"""
response, history = inference(model, template, query)
print(f'query: {query}')
print(f'response: {response}')

# 流式
query = '距离最远的城市是哪?'
gen = inference_stream(model, template, query, history)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
    delta = response[print_idx:]
    print(delta, end='', flush=True)
    print_idx = len(response)
print(f'history: {history}')

template_type: qwen2-vl
query: <img></img>距离各城市多远?
response: 根据图片中的路标,距离各城市的距离如下:

- 马踏:14公里
- 阳江:62公里
- 广州:293公里
query: 距离最远的城市是哪?
response: 距离最远的城市是广州,距离为293公里。
history: [['<img></img>距离各城市多远?', '根据图片中的路标,距离各城市的距离如下:\n\n- 马踏:14公里\n- 阳江:62公里\n- 广州:293公里'], ['距离最远的城市是哪?', '距离最远的城市是广州,距离为293公里。']]


将测试脚本中os.environ['CUDA_VISIBLE_DEVICES'] 设置为 0

$ python3
[INFO:swift] Successfully registered `/home/ps/Github/swift/swift/llm/data/dataset_info.json`
[INFO:swift] No LMDeploy installed, if you are using LMDeploy, you will get `ImportError: cannot import name 'prepare_lmdeploy_engine_template' from 'swift.llm'`
template_type: qwen2-vl
[INFO:swift] Downloading the model from ModelScope Hub, model_id: qwen/Qwen2-VL-2B-Instruct
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /home/ps/.cache/modelscope/hub/qwen/Qwen2-VL-2B-Instruct
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
[INFO:swift] model_kwargs: {'device_map': 'auto'}
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.08s/it]
[INFO:swift] model.max_model_len: 32768
[INFO:swift] Global seed set to 42
[INFO:swift] Using environment variable `SIZE_FACTOR`, Setting size_factor: 8.
[INFO:swift] Setting resized_height: None. You can adjust this hyperparameter through the environment variable: `RESIZED_HEIGHT`.
[INFO:swift] Setting resized_width: None. You can adjust this hyperparameter through the environment variable: `RESIZED_WIDTH`.
[INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`.
[INFO:swift] Using environment variable `MAX_PIXELS`, Setting max_pixels: 602112.
query: <img></img>距离各城市多远?
response: 这张图片显示了从马踏到阳江的距离是14公里,从阳江到广州的距离是62公里,从广州到马踏的距离是293公里。
query: 距离最远的城市是哪?
response: 距离最远的城市是广州,从马踏到广州的距离是293公里。
history: [['<img></img>距离各城市多远?', '这张图片显示了从马踏到阳江的距离是14公里,从阳江到广州的距离是62公里,从广州到马踏的距离是293公里。'], ['距离最远的城市是哪?', '距离最远的城市是广州,从马踏到广州的距离是293公里。']]


将测试脚本中os.environ['CUDA_VISIBLE_DEVICES'] 设置为 0,1

$ python3
[INFO:swift] Successfully registered `/home/ps/Github/swift/swift/llm/data/dataset_info.json`
[INFO:swift] No LMDeploy installed, if you are using LMDeploy, you will get `ImportError: cannot import name 'prepare_lmdeploy_engine_template' from 'swift.llm'`
template_type: qwen2-vl
[INFO:swift] Downloading the model from ModelScope Hub, model_id: qwen/Qwen2-VL-2B-Instruct
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /home/ps/.cache/modelscope/hub/qwen/Qwen2-VL-2B-Instruct
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
[INFO:swift] model_kwargs: {'device_map': 'auto'}
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.10s/it]
[INFO:swift] model.max_model_len: 32768
[INFO:swift] Global seed set to 42
[INFO:swift] Using environment variable `SIZE_FACTOR`, Setting size_factor: 8.
[INFO:swift] Setting resized_height: None. You can adjust this hyperparameter through the environment variable: `RESIZED_HEIGHT`.
[INFO:swift] Setting resized_width: None. You can adjust this hyperparameter through the environment variable: `RESIZED_WIDTH`.
[INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`.
[INFO:swift] Using environment variable `MAX_PIXELS`, Setting max_pixels: 602112.
Traceback (most recent call last):
  File "/home/ps/Github/AiVl/scripts/", line 24, in <module>
    response, history = inference(model, template, query)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/utils/", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ps/Github/swift/swift/llm/utils/", line 864, in inference
    generate_ids = model.generate(streamer=streamer, generation_config=generation_config, **inputs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/utils/", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/transformers/generation/", line 2053, in generate
    result = self._sample(
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/transformers/generation/", line 3040, in _sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0


将测试脚本中os.environ['CUDA_VISIBLE_DEVICES'] 分别设置为 0,1,20,1,2,3

$ python3
[INFO:swift] Successfully registered `/home/ps/Github/swift/swift/llm/data/dataset_info.json`
[INFO:swift] No LMDeploy installed, if you are using LMDeploy, you will get `ImportError: cannot import name 'prepare_lmdeploy_engine_template' from 'swift.llm'`
template_type: qwen2-vl
[INFO:swift] Downloading the model from ModelScope Hub, model_id: qwen/Qwen2-VL-2B-Instruct
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /home/ps/.cache/modelscope/hub/qwen/Qwen2-VL-2B-Instruct
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
[INFO:swift] model_kwargs: {'device_map': 'auto'}
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.07s/it]
[INFO:swift] model.max_model_len: 32768
[INFO:swift] Global seed set to 42
[INFO:swift] Using environment variable `SIZE_FACTOR`, Setting size_factor: 8.
[INFO:swift] Setting resized_height: None. You can adjust this hyperparameter through the environment variable: `RESIZED_HEIGHT`.
[INFO:swift] Setting resized_width: None. You can adjust this hyperparameter through the environment variable: `RESIZED_WIDTH`.
[INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`.
[INFO:swift] Using environment variable `MAX_PIXELS`, Setting max_pixels: 602112.
../aten/src/ATen/native/cuda/ indexSelectSmallIndex: block: [4,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "/home/ps/Github/AiVl/scripts/", line 24, in <module>
    response, history = inference(model, template, query)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/utils/", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ps/Github/swift/swift/llm/utils/", line 864, in inference
    generate_ids = model.generate(streamer=streamer, generation_config=generation_config, **inputs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/utils/", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/transformers/generation/", line 2053, in generate
    result = self._sample(
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/transformers/generation/", line 3003, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/nn/modules/", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/nn/modules/", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/accelerate/", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/transformers/models/qwen2_vl/", line 1680, in forward
    inputs_embeds = self.model.embed_tokens(input_ids)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/nn/modules/", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/nn/modules/", line 1603, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/accelerate/", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/nn/modules/", line 164, in forward
    return F.embedding(
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/nn/", line 2267, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.



Copy link
Contributor Author

Copy link
Contributor Author

@Jintao-Huang 辛苦抽空看一下,谢谢!

@Digital2Slave Digital2Slave changed the title V100 qwen2_vl_2b 单卡运行正常,双卡,三卡,四卡运行异常。 V100显卡 Ubuntu22.04系统 qwen2-vl-2b模型, 单卡测试脚本运行正常,双卡,三卡,四卡运行异常。 Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

No branches or pull requests

1 participant