Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Environment issues #40

Closed
harmlessSR opened this issue Sep 25, 2024 · 4 comments
Closed

Environment issues #40

harmlessSR opened this issue Sep 25, 2024 · 4 comments

Comments

@harmlessSR
Copy link

harmlessSR commented Sep 25, 2024

Hi,
I'm trying to run your excellent code! However,after I download WizardMath-7B-V1.0 from huggingface and run:

python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.0

I got:

ValueError: Model architectures ['LlamaModel'] are not supported for now. Supported architectures: ['AquilaModel', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'FalconForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MPTForCausalLM', 'OPTForCausalLM', 'QWenLMHeadModel', 'RWForCausalLM']

as the architecture of WizardMath-7B-V1.0 is 'LlamaModel'. Do you have any thoughts about this problem? I suspect this may be a problem of my environment...still appreciate it if you could provide any useful information!

Thanks a lot for your help!

@harmlessSR
Copy link
Author

harmlessSR commented Sep 26, 2024

As an update, the key conflict is:
I have torch2.0.1 as the document says now. Then 'vllm==0.1.4' requires 'xformers>=0.0.21', however, xformers0.0.21 requires torch<2.0, while xformers0.0.22 requires torch2.1.0.
I had a try to use torch2.1.0, but met the same error as with 2.0.1:
`$ python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.0

INFO:root:********** Run starts. **********
INFO:root:configuration is Namespace(finetuned_model_name='WizardMath-7B-V1.0', dataset_name='gsm8k', start_index=0, end_index=9223372036854775807, tensor_parallel_size=1, weight_format='delta_weight', weight_mask_rate=0.0, use_weight_rescale=False, mask_strategy='random', wizardcoder_use_llama2_as_backbone=False)
INFO 09-27 00:21:00 llm_engine.py:70] Initializing an LLM engine with config: model='WizardMath-7B-V1.0', tokenizer='WizardMath-7B-V1.0', tokenizer_mode=auto, trust_remote_code=False, dtype=torch.float16, use_dummy_weights=False, download_dir=None, use_np_weights=False, tensor_parallel_size=1, seed=0)
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Traceback (most recent call last):
File "/media/Disk1/WuMingrui/MergeLM-main/inference_llms_instruct_math_code.py", line 632, in
llm = create_llm(finetuned_model_name=args.finetuned_model_name,
File "/media/Disk1/WuMingrui/MergeLM-main/inference_llms_instruct_math_code.py", line 88, in create_llm
llm = LLM(model=finetuned_model_name, tensor_parallel_size=tensor_parallel_size)
File "/home/WuMingrui/miniconda3/envs/dare/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 66, in init
self.llm_engine = LLMEngine.from_engine_args(engine_args)
File "/home/WuMingrui/miniconda3/envs/dare/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 220, in from_engine_args
engine = cls(*engine_configs,
File "/home/WuMingrui/miniconda3/envs/dare/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 101, in init
self._init_workers(distributed_init_method)
File "/home/WuMingrui/miniconda3/envs/dare/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 133, in _init_workers
self._run_workers(
File "/home/WuMingrui/miniconda3/envs/dare/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 470, in _run_workers
output = executor(*args, **kwargs)
File "/home/WuMingrui/miniconda3/envs/dare/lib/python3.9/site-packages/vllm/worker/worker.py", line 67, in init_model
self.model = get_model(self.model_config)
File "/home/WuMingrui/miniconda3/envs/dare/lib/python3.9/site-packages/vllm/model_executor/model_loader.py", line 57, in get_model
model.load_weights(model_config.model, model_config.download_dir,
File "/home/WuMingrui/miniconda3/envs/dare/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 321, in load_weights
param = state_dict[name.replace(weight_name, "gate_up_proj")]
KeyError: 'layers.11.mlp.gate_up_proj.weight'
`

@yule-BUAA
Copy link
Owner

Hello,

Maybe you can try to run the following commands step by step?

  • pip install vllm==0.1.4
  • pip install transformers==4.33.1
  • pip install torch==2.0.1
  • pip install datasets==2.13.1
  • pip install xformers==0.0.21

@harmlessSR
Copy link
Author

harmlessSR commented Sep 27, 2024

Thanks a lot for your help and your wonderful work! Your commands work well!

@yule-BUAA
Copy link
Owner

Glad that I can help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants