Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Latest version cannot load Qwen2-VL model config correctly. #33401

Open
1 of 4 tasks
fyabc opened this issue Sep 10, 2024 · 15 comments · May be fixed by #33753
Open
1 of 4 tasks

[BUG] Latest version cannot load Qwen2-VL model config correctly. #33401

fyabc opened this issue Sep 10, 2024 · 15 comments · May be fixed by #33753
Labels
bug Multimodal Should Fix This has been identified as a bug and should be fixed. Vision

Comments

@fyabc
Copy link

fyabc commented Sep 10, 2024

System Info

  • transformers version: 4.45.0.dev0
  • Platform: Linux-5.10.134-16.101.al8.x86_64-x86_64-with-glibc2.35
  • Python version: 3.10.14
  • Huggingface_hub version: 0.23.4
  • Safetensors version: 0.4.3
  • Accelerate version: 0.32.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.4.0+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:
  • GPU type: NVIDIA L20Y

Who can help?

@amyeroberts @qubvel

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Download the config.json from Qwen2-VL-7B-Instruct HF main repo to /tmp/Qwen2-VL-7B-Instruct/config.json.
  • The downloaded config file content should be:
{
  "architectures": [
    "Qwen2VLForConditionalGeneration"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "vision_start_token_id": 151652,
  "vision_end_token_id": 151653,
  "vision_token_id": 151654,
  "image_token_id": 151655,
  "video_token_id": 151656,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2_vl",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.41.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vision_config": {
    "depth": 32,
    "embed_dim": 1280,
    "mlp_ratio": 4,
    "num_heads": 16,
    "in_chans": 3,
    "hidden_size": 3584,
    "patch_size": 14,
    "spatial_merge_size": 2,
    "spatial_patch_size": 14,
    "temporal_patch_size": 2
  },
  "rope_scaling": {
    "type": "mrope",
    "mrope_section": [
      16,
      24,
      24
    ]
  },
  "vocab_size": 152064
}
  1. Install the latest transformers version via pip install git+https://github.com/huggingface/transformers@main
  2. Run the following script:
from transformers import AutoConfig
config = AutoConfig.from_pretrained('/tmp/Qwen2-VL-7B-Instruct/')
print(config)
  1. The result is:
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}

Qwen2VLConfig {
  "_name_or_path": "/tmp/Qwen2-VL-7B-Instruct/",
  "architectures": [
    "Qwen2VLForConditionalGeneration"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "image_token_id": 151655,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2_vl",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": {
    "mrope_section": [
      16,
      24,
      24
    ],
    "rope_type": "default",
    "type": "default"
  },
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.45.0.dev0",
  "use_cache": true,
  "use_sliding_window": false,
  "video_token_id": 151656,
  "vision_config": {
    "in_chans": 3,
    "model_type": "qwen2_vl",
    "spatial_patch_size": 14
  },
  "vision_end_token_id": 151653,
  "vision_start_token_id": 151652,
  "vision_token_id": 151654,
  "vocab_size": 152064
}

It prints a warning message, and the output rope_scaling.type and rope_scaling.rope_type are set to default, but mrope is expected.

Expected behavior

This bug seems to be introduced in a recent version of transformers.
When I switch to a old version by git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830, the output is correct:

Qwen2VLConfig {
  "_name_or_path": "/tmp/Qwen2-VL-7B-Instruct/",
  "architectures": [
    "Qwen2VLForConditionalGeneration"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "image_token_id": 151655,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2_vl",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": {
    "mrope_section": [
      16,
      24,
      24
    ],
    "type": "mrope"
  },
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.45.0.dev0",
  "use_cache": true,
  "use_sliding_window": false,
  "video_token_id": 151656,
  "vision_config": {
    "in_chans": 3,
    "model_type": "qwen2_vl",
    "spatial_patch_size": 14
  },
  "vision_end_token_id": 151653,
  "vision_start_token_id": 151652,
  "vision_token_id": 151654,
  "vocab_size": 152064
}
@wangaocheng
Copy link

yes the same error

@LysandreJik
Copy link
Member

cc @zucchini-nlp as well I believe

@LysandreJik LysandreJik added Should Fix This has been identified as a bug and should be fixed. bug labels Sep 10, 2024
@zucchini-nlp
Copy link
Member

Hey! Yes, the warning is currently misleading as the RoPE implementation was recently standardized and Qwen2-VL has a quite different rope-scaling dict compared to other models. Yet, the generation quality shouldn't be affected by that, as per my last interaction with the model everything was same as before standardization

cc @gante as well, as you're working on uniform-RoPE, this might be something we want to fix

@gante
Copy link
Member

gante commented Sep 10, 2024

@zucchini-nlp if it is an expected argument, then we shouldn't throw a warning.

Perhaps we could add a extra_ignore_key argument to rope_config_validation, to define additional keys to ignore? I'm expecting this pattern (updating keys but wanting to keep the original in the config instance for BC) to happen again in the future

@zucchini-nlp
Copy link
Member

@gante yes, that sounds good. I believe this will be part of your RoPE standardization PR, since it's not very urgent and generation is not broken

@monkeywl2020
Copy link

monkeywl2020 commented Sep 12, 2024

In the initialization function of class Qwen2VLConfig in src/transformers/models/qwen2_vl/configuration_qwen2_vl.py, I found this code。

if self.rope_scaling is not None and "type" in self.rope_scaling: 
       if self.rope_scaling["type"] == "mrope": 
               self.rope_scaling["type"] = "default" 
       self.rope_scaling["rope_type"] = self.rope_scaling["type"] 

This place has modified the configuration。 rope_scaling["type"] and rope_scaling["rope_type"] Changed to default

@zucchini-nlp
Copy link
Member

@monkeywl2020 yes, that was a hack to enable uniform RoPE which currently doesn't accept mrope-dtype and since mrope is same as the default rope, with the only difference that the position ids have an extra dimension for height/width/temporal dim

We'll handle this in a better way, to accept non-standard rope kwargs soon

@monkeywl2020
Copy link

@monkeywl2020 yes, that was a hack to enable uniform RoPE which currently doesn't accept mrope-dtype and since mrope is same as the default rope, with the only difference that the position ids have an extra dimension for height/width/temporal dim

We'll handle this in a better way, to accept non-standard rope kwargs soon

OK

@fyabc
Copy link
Author

fyabc commented Sep 13, 2024

@zucchini-nlp Hi, can you provide an approximate time for this bug to be fixed?

@zucchini-nlp
Copy link
Member

@gante will you add this to your general RoPE PR or we can fix it separately?

@exceedzhang
Copy link

image the same error!

@RANYABING
Copy link

same error!

Unrecognized keys in rope_scaling for 'rope_type'='default': {'mrope_section'}
Traceback (most recent call last):
......

@IvanZidov
Copy link

Same here!

@niaoyu
Copy link

niaoyu commented Sep 25, 2024

Just
pip install git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830

is OK.

The pr: #32617
seems break the logic about the qwen rope parameter

@xuyue1112
Copy link

same problem. If I have already trained with the latest version of master, do I need to retrain with 21fac7abba2a37fae86106f87fcf9974fd1e3830, or do I only need to use this version for inference?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Multimodal Should Fix This has been identified as a bug and should be fixed. Vision
Projects
None yet
Development

Successfully merging a pull request may close this issue.