FEAT: Add possibility of skipping modules when quantizing #248

younesbelkada · 2023-12-11T14:17:01Z

What does this PR do?

For some models (e.g., Whisper, Mixtral or Llava) it is important to skip some modules during quantization. This PR adds the experimental support for skipping modules during quantization with the following API.

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer, AwqConfig

model_path = "facebook/opt-125m"
quant_path = "test-quant/opt-125m-awq-no-kproj"
modules_to_not_convert = ["k_proj"]

quant_config = {"zero_point": True, "q_group_size": 128, "w_bit": 4, "version":"GEMM", "modules_to_not_convert": modules_to_not_convert}

# Load model
model = AutoAWQForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Quantize
model.quantize(tokenizer, quant_config=quant_config, modules_to_not_convert=modules_to_not_convert)

An example model has been pushed here: https://huggingface.co/ybelkada/opt-125m-awq-no-k-proj and works fine with a PR of transformers that I will open soon.

cc @casper-hansen @TheBloke

v1

024d960

younesbelkada mentioned this pull request Dec 11, 2023

[Awq] Enable the possibility to skip quantization for some target modules huggingface/transformers#27950

Merged

casper-hansen merged commit 9c3dfa0 into main Dec 11, 2023

younesbelkada deleted the fix-release branch December 11, 2023 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Add possibility of skipping modules when quantizing #248

FEAT: Add possibility of skipping modules when quantizing #248

younesbelkada commented Dec 11, 2023

FEAT: Add possibility of skipping modules when quantizing #248

FEAT: Add possibility of skipping modules when quantizing #248

Conversation

younesbelkada commented Dec 11, 2023

What does this PR do?