Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

awq int4报错 got an unexpected keyword argument 'past_key_values' #260

Closed
xxm1668 opened this issue Dec 14, 2023 · 10 comments · Fixed by #264
Closed

awq int4报错 got an unexpected keyword argument 'past_key_values' #260

xxm1668 opened this issue Dec 14, 2023 · 10 comments · Fixed by #264

Comments

@xxm1668
Copy link

xxm1668 commented Dec 14, 2023

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/home/house365ai/.conda/envs/autoawq/lib/python3.10/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
Token indices sequence length is longer than the specified maximum sequence length for this model (8947 > 4096). Running this sequence through the model will result in indexing errors
AWQ: 0%| | 0/60 [00:02<?, ?it/s]
Traceback (most recent call last):
File "/home/house365ai/xxm/AutoAWQ/awq_int4.py", line 13, in
model.quantize(tokenizer, quant_config=quant_config)
File "/home/house365ai/.conda/envs/autoawq/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/house365ai/xxm/AutoAWQ/awq/models/base.py", line 59, in quantize
quantizer.quantize()
File "/home/house365ai/xxm/AutoAWQ/awq/quantize/quantizer.py", line 95, in quantize
input_feat = self._get_input_feat(self.modules[i], named_linears)
File "/home/house365ai/xxm/AutoAWQ/awq/quantize/quantizer.py", line 393, in _get_input_feat
self.inps = layer(self.inps, **self.module_kwargs)[0]
File "/home/house365ai/.conda/envs/autoawq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/house365ai/.conda/envs/autoawq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/house365ai/xxm/transformers-main/src/transformers/models/llama/modeling_llama.py", line 796, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/house365ai/.conda/envs/autoawq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/house365ai/.conda/envs/autoawq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
TypeError: LlamaSdpaAttention.forward() got an unexpected keyword argument 'past_key_values'

@xxm1668
Copy link
Author

xxm1668 commented Dec 14, 2023

transformers==4.35可以运行

@casper-hansen
Copy link
Owner

Can you try to upgrade to transformers 4.36.0?

@xxm1668
Copy link
Author

xxm1668 commented Dec 14, 2023

why upgrade? 4.35.0 is running

@dongkuang
Copy link

I have the same error,and my transformers is 4.36.0.
TypeError: QWenBlock.forward() got an unexpected keyword argument 'past_key_values'

@xxm1668
Copy link
Author

xxm1668 commented Dec 14, 2023

@dongkuang 4.35.0试试

@dongkuang
Copy link

dongkuang commented Dec 14, 2023

I have test "transformers==4.35.0",but still show the error is "TypeError: QWenBlock.forward() got an unexpected keyword argument 'past_key_values'"

and other error"Token indices sequence length is longer than the specified maximum sequence length for this model (57053 > 32768). Running this sequence through the model will result in indexing errors"

My code is:

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = 'Qwen/Qwen-72B-Chat'
quant_path = 'Qwen-72B-Chat-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

Load model

model = AutoAWQForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

Quantize

model.quantize(tokenizer, quant_config=quant_config)

Save quantized model

model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = 'Qwen/Qwen-72B-Chat'
quant_path = 'Qwen-72B-Chat-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

Load model

NOTE: pass safetensors=True to load safetensors

model = AutoAWQForCausalLM.from_pretrained(model_path, **{"low_cpu_mem_usage": True},safetensors=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

Quantize

model.quantize(tokenizer, quant_config=quant_config)

Save quantized model

model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

print(f'Model is quantized and saved at "{quant_path}"')

@casper-hansen
Copy link
Owner

casper-hansen commented Dec 14, 2023

There may be a bug introduced in recent commits while trying to adapt to transformers 4.36.0. Can you run the quant if you checkout the following commit, build, and quantize when using transformers 4.35.2?

git checkout 6b5dc29fb1325f1473286d5a195873bdb00b9293

@dongkuang
Copy link

i have the new error:Token indices sequence length is longer than the specified maximum sequence length for this model (57053 > 32768). Running this sequence through the model will result in indexing errors

@xxm1668
Copy link
Author

xxm1668 commented Dec 15, 2023

@dongkuang 你用的哪个模型量化的?

@dongkuang
Copy link

dongkuang commented Dec 16, 2023

Successfully processed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants