You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you install the marlin kernels then you should be able to run inference in AutoAWQ. Otherwise, I would advise to quantize using GEMM because vLLM 0.5.3 now includes an automatic mapping to optimized Marlin kernels from that format.
First thanks to this awesome work with marlin kernel, curent I didn't find a way infer awq_marlin model, need help.
quant
I quant qwen2-72B with
quant_config = { "zero_point": False, "q_group_size": 128, "w_bit": 4, "version": "Marlin" }
Found that model.layers.0.self_attn.q_proj.qzeros does not exist with diff to other version.
infer with awq demo
with vllm
vllm-project/vllm#6612
I build vllm from current main source. Got error, with debugpy I found in layer model.layers.0.self_attn.q_proj.qzeros
try use offical demo
https://github.com/casper-hansen/AutoAWQ/blob/main/docs/examples.md#transformers
first run I modify code, because there is no qzeros layer.
next, I got this error
I cannot import marlin_cuda In this repo no this file.
but I found in gptq:
https://github.com/AutoGPTQ/AutoGPTQ/blob/main/autogptq_extension/marlin/marlin_cuda.cpp
Anyway I want to know a way to run model.
The text was updated successfully, but these errors were encountered: