Environment issues #40

harmlessSR · 2024-09-25T16:44:53Z

Hi,
I'm trying to run your excellent code! However,after I download WizardMath-7B-V1.0 from huggingface and run:

python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.0

I got:

ValueError: Model architectures ['LlamaModel'] are not supported for now. Supported architectures: ['AquilaModel', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'FalconForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MPTForCausalLM', 'OPTForCausalLM', 'QWenLMHeadModel', 'RWForCausalLM']

as the architecture of WizardMath-7B-V1.0 is 'LlamaModel'. Do you have any thoughts about this problem? I suspect this may be a problem of my environment...still appreciate it if you could provide any useful information!

Thanks a lot for your help!

harmlessSR · 2024-09-26T16:23:30Z

As an update, the key conflict is:
I have torch2.0.1 as the document says now. Then 'vllm==0.1.4' requires 'xformers>=0.0.21', however, xformers0.0.21 requires torch<2.0, while xformers0.0.22 requires torch2.1.0.
I had a try to use torch2.1.0, but met the same error as with 2.0.1:
`$ python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.0

INFO:root:********** Run starts. **********
INFO:root:configuration is Namespace(finetuned_model_name='WizardMath-7B-V1.0', dataset_name='gsm8k', start_index=0, end_index=9223372036854775807, tensor_parallel_size=1, weight_format='delta_weight', weight_mask_rate=0.0, use_weight_rescale=False, mask_strategy='random', wizardcoder_use_llama2_as_backbone=False)
INFO 09-27 00:21:00 llm_engine.py:70] Initializing an LLM engine with config: model='WizardMath-7B-V1.0', tokenizer='WizardMath-7B-V1.0', tokenizer_mode=auto, trust_remote_code=False, dtype=torch.float16, use_dummy_weights=False, download_dir=None, use_np_weights=False, tensor_parallel_size=1, seed=0)
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Traceback (most recent call last):
File "/media/Disk1/WuMingrui/MergeLM-main/inference_llms_instruct_math_code.py", line 632, in
llm = create_llm(finetuned_model_name=args.finetuned_model_name,
File "/media/Disk1/WuMingrui/MergeLM-main/inference_llms_instruct_math_code.py", line 88, in create_llm
llm = LLM(model=finetuned_model_name, tensor_parallel_size=tensor_parallel_size)
File "/home/WuMingrui/miniconda3/envs/dare/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 66, in init
self.llm_engine = LLMEngine.from_engine_args(engine_args)
File "/home/WuMingrui/miniconda3/envs/dare/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 220, in from_engine_args
engine = cls(*engine_configs,
File "/home/WuMingrui/miniconda3/envs/dare/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 101, in init
self._init_workers(distributed_init_method)
File "/home/WuMingrui/miniconda3/envs/dare/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 133, in _init_workers
self._run_workers(
File "/home/WuMingrui/miniconda3/envs/dare/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 470, in _run_workers
output = executor(*args, **kwargs)
File "/home/WuMingrui/miniconda3/envs/dare/lib/python3.9/site-packages/vllm/worker/worker.py", line 67, in init_model
self.model = get_model(self.model_config)
File "/home/WuMingrui/miniconda3/envs/dare/lib/python3.9/site-packages/vllm/model_executor/model_loader.py", line 57, in get_model
model.load_weights(model_config.model, model_config.download_dir,
File "/home/WuMingrui/miniconda3/envs/dare/lib/python3.9/site-packages/vllm/model_executor/models/llama.py", line 321, in load_weights
param = state_dict[name.replace(weight_name, "gate_up_proj")]
KeyError: 'layers.11.mlp.gate_up_proj.weight'
`

yule-BUAA · 2024-09-26T16:55:15Z

Hello,

Maybe you can try to run the following commands step by step?

pip install vllm==0.1.4
pip install transformers==4.33.1
pip install torch==2.0.1
pip install datasets==2.13.1
pip install xformers==0.0.21

harmlessSR · 2024-09-27T10:37:35Z

Thanks a lot for your help and your wonderful work! Your commands work well!

yule-BUAA · 2024-09-28T05:16:12Z

Glad that I can help!

yule-BUAA closed this as completed Sep 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Environment issues #40

Environment issues #40

harmlessSR commented Sep 25, 2024 •

edited

Loading

harmlessSR commented Sep 26, 2024 •

edited

Loading

yule-BUAA commented Sep 26, 2024

harmlessSR commented Sep 27, 2024 •

edited

Loading

yule-BUAA commented Sep 28, 2024

Environment issues #40

Environment issues #40

Comments

harmlessSR commented Sep 25, 2024 • edited Loading

harmlessSR commented Sep 26, 2024 • edited Loading

yule-BUAA commented Sep 26, 2024

harmlessSR commented Sep 27, 2024 • edited Loading

yule-BUAA commented Sep 28, 2024

harmlessSR commented Sep 25, 2024 •

edited

Loading

harmlessSR commented Sep 26, 2024 •

edited

Loading

harmlessSR commented Sep 27, 2024 •

edited

Loading