-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Update gptq_marlin
to use new vLLMParameters
#7281
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
/ready |
5b7a7e0
to
22097f0
Compare
vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_wNa16.py
Show resolved
Hide resolved
3f77a88
to
94a2233
Compare
94a2233
to
0e8561e
Compare
Summary
gptq_marlin
parameters to usevLLMParameters
to simplify linear layer weight loadingPackedColumnParameter
to support packed parameters without row parallelism