awq int4报错 got an unexpected keyword argument 'past_key_values' #260

xxm1668 · 2023-12-14T05:45:53Z

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/home/house365ai/.conda/envs/autoawq/lib/python3.10/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
Token indices sequence length is longer than the specified maximum sequence length for this model (8947 > 4096). Running this sequence through the model will result in indexing errors
AWQ: 0%| | 0/60 [00:02<?, ?it/s]
Traceback (most recent call last):
File "/home/house365ai/xxm/AutoAWQ/awq_int4.py", line 13, in
model.quantize(tokenizer, quant_config=quant_config)
File "/home/house365ai/.conda/envs/autoawq/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/house365ai/xxm/AutoAWQ/awq/models/base.py", line 59, in quantize
quantizer.quantize()
File "/home/house365ai/xxm/AutoAWQ/awq/quantize/quantizer.py", line 95, in quantize
input_feat = self._get_input_feat(self.modules[i], named_linears)
File "/home/house365ai/xxm/AutoAWQ/awq/quantize/quantizer.py", line 393, in _get_input_feat
self.inps = layer(self.inps, **self.module_kwargs)[0]
File "/home/house365ai/.conda/envs/autoawq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/house365ai/.conda/envs/autoawq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/house365ai/xxm/transformers-main/src/transformers/models/llama/modeling_llama.py", line 796, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/house365ai/.conda/envs/autoawq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/house365ai/.conda/envs/autoawq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
TypeError: LlamaSdpaAttention.forward() got an unexpected keyword argument 'past_key_values'

xxm1668 · 2023-12-14T06:14:50Z

transformers==4.35可以运行

casper-hansen · 2023-12-14T06:30:46Z

Can you try to upgrade to transformers 4.36.0?

xxm1668 · 2023-12-14T06:49:43Z

why upgrade? 4.35.0 is running

dongkuang · 2023-12-14T08:40:03Z

I have the same error,and my transformers is 4.36.0.
TypeError: QWenBlock.forward() got an unexpected keyword argument 'past_key_values'

xxm1668 · 2023-12-14T09:02:25Z

@dongkuang 4.35.0试试

dongkuang · 2023-12-14T09:56:24Z

I have test "transformers==4.35.0",but still show the error is "TypeError: QWenBlock.forward() got an unexpected keyword argument 'past_key_values'"

and other error"Token indices sequence length is longer than the specified maximum sequence length for this model (57053 > 32768). Running this sequence through the model will result in indexing errors"

My code is:

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = 'Qwen/Qwen-72B-Chat'
quant_path = 'Qwen-72B-Chat-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

Load model

model = AutoAWQForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

Quantize

model.quantize(tokenizer, quant_config=quant_config)

Save quantized model

model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = 'Qwen/Qwen-72B-Chat'
quant_path = 'Qwen-72B-Chat-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

Load model

NOTE: pass safetensors=True to load safetensors

model = AutoAWQForCausalLM.from_pretrained(model_path, **{"low_cpu_mem_usage": True},safetensors=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

Quantize

model.quantize(tokenizer, quant_config=quant_config)

Save quantized model

model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

print(f'Model is quantized and saved at "{quant_path}"')

casper-hansen · 2023-12-14T17:49:56Z

There may be a bug introduced in recent commits while trying to adapt to transformers 4.36.0. Can you run the quant if you checkout the following commit, build, and quantize when using transformers 4.35.2?

git checkout 6b5dc29fb1325f1473286d5a195873bdb00b9293

dongkuang · 2023-12-15T04:49:00Z

i have the new error:Token indices sequence length is longer than the specified maximum sequence length for this model (57053 > 32768). Running this sequence through the model will result in indexing errors

xxm1668 · 2023-12-15T06:36:50Z

@dongkuang 你用的哪个模型量化的？

dongkuang · 2023-12-16T04:08:53Z

Successfully processed

younesbelkada mentioned this issue Dec 14, 2023

Fix quantization issue with transformers >= 4.36.0 #264

Merged

casper-hansen closed this as completed in #264 Dec 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

awq int4报错 got an unexpected keyword argument 'past_key_values' #260

awq int4报错 got an unexpected keyword argument 'past_key_values' #260

xxm1668 commented Dec 14, 2023

xxm1668 commented Dec 14, 2023

casper-hansen commented Dec 14, 2023

xxm1668 commented Dec 14, 2023

dongkuang commented Dec 14, 2023

xxm1668 commented Dec 14, 2023

dongkuang commented Dec 14, 2023 •

edited

Loading

casper-hansen commented Dec 14, 2023 •

edited

Loading

dongkuang commented Dec 15, 2023

xxm1668 commented Dec 15, 2023

dongkuang commented Dec 16, 2023 •

edited

Loading

awq int4报错 got an unexpected keyword argument 'past_key_values' #260

awq int4报错 got an unexpected keyword argument 'past_key_values' #260

Comments

xxm1668 commented Dec 14, 2023

xxm1668 commented Dec 14, 2023

casper-hansen commented Dec 14, 2023

xxm1668 commented Dec 14, 2023

dongkuang commented Dec 14, 2023

xxm1668 commented Dec 14, 2023

dongkuang commented Dec 14, 2023 • edited Loading

Load model

Quantize

Save quantized model

Load model

NOTE: pass safetensors=True to load safetensors

Quantize

Save quantized model

casper-hansen commented Dec 14, 2023 • edited Loading

dongkuang commented Dec 15, 2023

xxm1668 commented Dec 15, 2023

dongkuang commented Dec 16, 2023 • edited Loading

dongkuang commented Dec 14, 2023 •

edited

Loading

casper-hansen commented Dec 14, 2023 •

edited

Loading

dongkuang commented Dec 16, 2023 •

edited

Loading