Add Qwen model #182

Sanster · 2023-11-10T06:15:31Z

Modify according to this PR: #78

casper-hansen · 2023-11-10T08:34:03Z

Hi @Sanster, thank you for this. Did you test if quantizing a model works and that inference runs?

the problem I ran into in my old PR was that there was some problem with the modeling code that prevented me from quantizing.

casper-hansen · 2023-11-10T19:13:32Z

I tried quantizing a model but the model outputs are weird and the eval does not work for this model. At this time, I don't think we can merge this pull request before we can measure that it works after quantizing

Sanster · 2023-11-15T03:23:39Z

Hi, in my testing, the model is working properly.

Quant script

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
from datasets import load_dataset

model_dir = "Qwen/Qwen-7B-Chat"
save_dir = "./Qwen-7B-Chat-quant"
dataset = load_dataset("GAIR/lima")["train"]
quant_count = 16

quant_config = {
    "zero_point": True,
    "q_group_size": 128,
    "w_bit": 4,
    "version": "GEMM",
}

examples = []
for it in dataset["conversations"][:quant_count]:
    query = it[0]
    answer = it[1]
    examples.append(
        f"<|im_start|>user\n{query}<|im_end|>\n<|im_start|>assistant\n{answer}<|im_end|>"
    )

tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
tokenizer.save_pretrained(str(save_dir))

model = (
    AutoAWQForCausalLM.from_pretrained(
        model_dir, trust_remote_code=True, safetensors=True
    )
    .eval()
    .cuda()
)

model.quantize(tokenizer, quant_config=quant_config, calib_data=examples)
model.save_quantized(str(save_dir), safetensors=True)

Test script

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer, TextStreamer

quant_model_dir = "./Qwen-7B-Chat-quant"
text = "<|im_start|>user\nWho are you?<|im_end|>\n<|im_start|>assistant\n"

model = AutoAWQForCausalLM.from_quantized(str(quant_model_dir), fuse_layers=True).eval()
tokenizer = AutoTokenizer.from_pretrained(str(quant_model_dir), trust_remote_code=True)
streamer = TextStreamer(tokenizer, skip_special_tokens=True)

tokens = tokenizer(text, return_tensors="pt").input_ids.cuda()

model.generate(tokens, streamer=streamer, max_new_tokens=100, eos_token_id=151645)

enbacoo · 2023-11-19T14:18:29Z

@casper-hansen , I test @Sanster job , when I use " <|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n{query}<|im_end|>\n<|im_start|>assistant\n{answer}<|im_end|>" replace Quant script's part, it works well.

casper-hansen · 2023-11-20T11:50:13Z

This seems to work for me now with the right prompt template. Thanks for the PR! (NOTE: Eval is not working currently, but response of Qwen looks good).

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer, TextStreamer

quant_path = "qwen-7b-chat-awq"

# Load model
model = AutoAWQForCausalLM.from_quantized(quant_path, fuse_layers=True)
tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

# Convert prompt to tokens
prompt_template = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n"

prompt = "You're standing on the surface of the Earth. "\
        "You walk one mile south, one mile west and one mile north. "\
        "You end up exactly where you started. Where are you?"

tokens = tokenizer(
    prompt_template.format(prompt=prompt), 
    return_tensors='pt'
).input_ids.cuda()

# Generate output
generation_output = model.generate(
    tokens, 
    streamer=streamer,
    max_new_tokens=512,
    eos_token_id=151645
)

casper-hansen and others added 2 commits November 10, 2023 13:43

Add Qwen model

8ff21d4

fix qwen model

8da3095

Use device_map

5c87d2f

Merge branch 'main' into pr/182

eae08de

casper-hansen merged commit e440c7a into casper-hansen:main Nov 20, 2023

casper-hansen mentioned this pull request Nov 23, 2023

After using AutoAWQ, Qwen decreased by 10 points #223

Closed

chenxu2048 mentioned this pull request Dec 4, 2023

Could you help to provide a solution for Qwen-14B-Chat-Int4(gptq) using vllm? Many Thanks! vllm-project/vllm#1881

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen model #182

Add Qwen model #182

Sanster commented Nov 10, 2023

casper-hansen commented Nov 10, 2023

casper-hansen commented Nov 10, 2023

Sanster commented Nov 15, 2023 •

edited

Loading

enbacoo commented Nov 19, 2023

casper-hansen commented Nov 20, 2023

Add Qwen model #182

Add Qwen model #182

Conversation

Sanster commented Nov 10, 2023

casper-hansen commented Nov 10, 2023

casper-hansen commented Nov 10, 2023

Sanster commented Nov 15, 2023 • edited Loading

Quant script

Test script

enbacoo commented Nov 19, 2023

casper-hansen commented Nov 20, 2023

Sanster commented Nov 15, 2023 •

edited

Loading