Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V100显卡 Ubuntu22.04系统 qwen2-vl-2b模型, 单卡测试脚本运行正常,双卡,三卡,四卡运行异常。 #2087

Open
Digital2Slave opened this issue Sep 20, 2024 · 2 comments

Comments

@Digital2Slave
Copy link
Contributor

参考 https://github.com/modelscope/ms-swift/blob/main/docs/source/Multi-Modal/qwen2-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md,在 四个16G V100 显卡主机上,搭建环境,测试单样本推理脚本时发现,仅单卡时可以正常运行。双卡,三卡和四卡时运行异常。

搭建环境

$ mkvirtualenv aivl -p /usr/bin/python3.10
(aivl) $ pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu118
(aivl) $ git clone https://github.com/modelscope/ms-swift.git
(aivl) $ cd ms-swift
(aivl) $ pip install -e .[llm]
#!< https://github.com/modelscope/ms-swift/blob/main/docs/source/Multi-Modal/qwen2-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md 
(aivl) $ pip install git+https://github.com/huggingface/transformers.git
(aivl) $ pip install pyav qwen_vl_utils


#!< https://github.com/modelscope/ms-swift/issues/2064 
# qwen2-vl 
# https://github.com/QwenLM/Qwen2-VL/issues/96
(aivl) $ pip install git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830
# vllm加速
(aivl) $ pip install vllm>=0.6.1

测试脚本 qwen2_vl_2b.py

import os
#!< 调整环境变量CUDA_VISIBLE_DEVICES,分别为0; 0,1; 0,1,2; 0,1,2,3
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

#!< --------------修改地方----------------
os.environ['SIZE_FACTOR'] = '8'
os.environ['MAX_PIXELS'] = '602112'
# ---------------------------------------

from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch

model_type = ModelType.qwen2_vl_2b_instruct
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

#!< --------------------------修改地方,torch.float16-------------------------
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
                                       model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

query = """<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远?"""
response, history = inference(model, template, query)
print(f'query: {query}')
print(f'response: {response}')

# 流式
query = '距离最远的城市是哪?'
gen = inference_stream(model, template, query, history)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
    delta = response[print_idx:]
    print(delta, end='', flush=True)
    print_idx = len(response)
print()
print(f'history: {history}')

"""
template_type: qwen2-vl
query: <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远?
response: 根据图片中的路标,距离各城市的距离如下:

- 马踏:14公里
- 阳江:62公里
- 广州:293公里
query: 距离最远的城市是哪?
response: 距离最远的城市是广州,距离为293公里。
history: [['<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远?', '根据图片中的路标,距离各城市的距离如下:\n\n- 马踏:14公里\n- 阳江:62公里\n- 广州:293公里'], ['距离最远的城市是哪?', '距离最远的城市是广州,距离为293公里。']]
"""

单卡测试结果

将测试脚本中os.environ['CUDA_VISIBLE_DEVICES'] 设置为 0

$ python3 qwen2_vl_2b.py
[INFO:swift] Successfully registered `/home/ps/Github/swift/swift/llm/data/dataset_info.json`
[INFO:swift] No LMDeploy installed, if you are using LMDeploy, you will get `ImportError: cannot import name 'prepare_lmdeploy_engine_template' from 'swift.llm'`
template_type: qwen2-vl
[INFO:swift] Downloading the model from ModelScope Hub, model_id: qwen/Qwen2-VL-2B-Instruct
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /home/ps/.cache/modelscope/hub/qwen/Qwen2-VL-2B-Instruct
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
[INFO:swift] model_kwargs: {'device_map': 'auto'}
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.08s/it]
[INFO:swift] model.max_model_len: 32768
[INFO:swift] Global seed set to 42
[INFO:swift] Using environment variable `SIZE_FACTOR`, Setting size_factor: 8.
[INFO:swift] Setting resized_height: None. You can adjust this hyperparameter through the environment variable: `RESIZED_HEIGHT`.
[INFO:swift] Setting resized_width: None. You can adjust this hyperparameter through the environment variable: `RESIZED_WIDTH`.
[INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`.
[INFO:swift] Using environment variable `MAX_PIXELS`, Setting max_pixels: 602112.
query: <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远?
response: 这张图片显示了从马踏到阳江的距离是14公里,从阳江到广州的距离是62公里,从广州到马踏的距离是293公里。
query: 距离最远的城市是哪?
response: 距离最远的城市是广州,从马踏到广州的距离是293公里。
history: [['<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远?', '这张图片显示了从马踏到阳江的距离是14公里,从阳江到广州的距离是62公里,从广州到马踏的距离是293公里。'], ['距离最远的城市是哪?', '距离最远的城市是广州,从马踏到广州的距离是293公里。']]

双卡测试结果

将测试脚本中os.environ['CUDA_VISIBLE_DEVICES'] 设置为 0,1

$ python3 qwen2_vl_2b.py
[INFO:swift] Successfully registered `/home/ps/Github/swift/swift/llm/data/dataset_info.json`
[INFO:swift] No LMDeploy installed, if you are using LMDeploy, you will get `ImportError: cannot import name 'prepare_lmdeploy_engine_template' from 'swift.llm'`
template_type: qwen2-vl
[INFO:swift] Downloading the model from ModelScope Hub, model_id: qwen/Qwen2-VL-2B-Instruct
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /home/ps/.cache/modelscope/hub/qwen/Qwen2-VL-2B-Instruct
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
[INFO:swift] model_kwargs: {'device_map': 'auto'}
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.10s/it]
[INFO:swift] model.max_model_len: 32768
[INFO:swift] Global seed set to 42
[INFO:swift] Using environment variable `SIZE_FACTOR`, Setting size_factor: 8.
[INFO:swift] Setting resized_height: None. You can adjust this hyperparameter through the environment variable: `RESIZED_HEIGHT`.
[INFO:swift] Setting resized_width: None. You can adjust this hyperparameter through the environment variable: `RESIZED_WIDTH`.
[INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`.
[INFO:swift] Using environment variable `MAX_PIXELS`, Setting max_pixels: 602112.
Traceback (most recent call last):
  File "/home/ps/Github/AiVl/scripts/qwen2_vl_2b.py", line 24, in <module>
    response, history = inference(model, template, query)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ps/Github/swift/swift/llm/utils/utils.py", line 864, in inference
    generate_ids = model.generate(streamer=streamer, generation_config=generation_config, **inputs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/transformers/generation/utils.py", line 2053, in generate
    result = self._sample(
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/transformers/generation/utils.py", line 3040, in _sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

三卡和四卡测试结果

将测试脚本中os.environ['CUDA_VISIBLE_DEVICES'] 分别设置为 0,1,20,1,2,3

$ python3 qwen2_vl_2b.py
[INFO:swift] Successfully registered `/home/ps/Github/swift/swift/llm/data/dataset_info.json`
[INFO:swift] No LMDeploy installed, if you are using LMDeploy, you will get `ImportError: cannot import name 'prepare_lmdeploy_engine_template' from 'swift.llm'`
template_type: qwen2-vl
[INFO:swift] Downloading the model from ModelScope Hub, model_id: qwen/Qwen2-VL-2B-Instruct
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /home/ps/.cache/modelscope/hub/qwen/Qwen2-VL-2B-Instruct
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}
[INFO:swift] model_kwargs: {'device_map': 'auto'}
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.07s/it]
[INFO:swift] model.max_model_len: 32768
[INFO:swift] Global seed set to 42
[INFO:swift] Using environment variable `SIZE_FACTOR`, Setting size_factor: 8.
[INFO:swift] Setting resized_height: None. You can adjust this hyperparameter through the environment variable: `RESIZED_HEIGHT`.
[INFO:swift] Setting resized_width: None. You can adjust this hyperparameter through the environment variable: `RESIZED_WIDTH`.
[INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`.
[INFO:swift] Using environment variable `MAX_PIXELS`, Setting max_pixels: 602112.
../aten/src/ATen/native/cuda/Indexing.cu:1231: indexSelectSmallIndex: block: [4,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
......
Traceback (most recent call last):
  File "/home/ps/Github/AiVl/scripts/qwen2_vl_2b.py", line 24, in <module>
    response, history = inference(model, template, query)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ps/Github/swift/swift/llm/utils/utils.py", line 864, in inference
    generate_ids = model.generate(streamer=streamer, generation_config=generation_config, **inputs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/transformers/generation/utils.py", line 2053, in generate
    result = self._sample(
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/transformers/generation/utils.py", line 3003, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1680, in forward
    inputs_embeds = self.model.embed_tokens(input_ids)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/accelerate/hooks.py", line 170, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 164, in forward
    return F.embedding(
  File "/home/ps/.virtualenvs/aivl/lib/python3.10/site-packages/torch/nn/functional.py", line 2267, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

针对这个问题,有什么解决方案么?

参考连接

@Digital2Slave
Copy link
Contributor Author

@Digital2Slave
Copy link
Contributor Author

@Jintao-Huang 辛苦抽空看一下,谢谢!

@Digital2Slave Digital2Slave changed the title V100 qwen2_vl_2b 单卡运行正常,双卡,三卡,四卡运行异常。 V100显卡 Ubuntu22.04系统 qwen2-vl-2b模型, 单卡测试脚本运行正常,双卡,三卡,四卡运行异常。 Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant