New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Misc] Update `gptq_marlin` to use new vLLMParameters #7281

Merged

mgoin merged 14 commits into vllm-project:main from neuralmagic:update_gptq_params

Aug 13, 2024

Contributor

dsikka commented Aug 7, 2024 •

edited

Loading

Summary

Updates the gptq_marlin parameters to use vLLMParameters to simplify linear layer weight loading
Updates to add PackedColumnParameter to support packed parameters without row parallelism

github-actions bot commented Aug 7, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

Contributor Author

dsikka commented Aug 7, 2024

/ready

github-actions bot added the ready label

youkaichao requested a review from robertgshaw2-neuralmagic

August 7, 2024 23:28

mgoin reviewed

View reviewed changes

vllm/model_executor/layers/quantization/gptq_marlin.py Show resolved Hide resolved

vllm/model_executor/layers/quantization/gptq_marlin.py Show resolved Hide resolved

vllm/model_executor/parameter.py Outdated Show resolved Hide resolved

vllm/model_executor/parameter.py Show resolved Hide resolved

vllm/model_executor/parameter.py Outdated Show resolved Hide resolved

vllm/model_executor/parameter.py Outdated Show resolved Hide resolved

dsikka force-pushed the update_gptq_params branch from 5b7a7e0 to 22097f0 Compare

August 8, 2024 01:39

robertgshaw2-neuralmagic reviewed

View reviewed changes

.buildkite/test-pipeline.yaml Outdated Show resolved Hide resolved

robertgshaw2-neuralmagic reviewed

View reviewed changes

tests/weight_loading/models.txt Show resolved Hide resolved

robertgshaw2-neuralmagic reviewed

View reviewed changes

tests/weight_loading/models.txt Show resolved Hide resolved

robertgshaw2-neuralmagic reviewed

View reviewed changes

tests/weight_loading/test_weight_loading.py Show resolved Hide resolved

robertgshaw2-neuralmagic reviewed

View reviewed changes

tests/weight_loading/test_weight_loading.py Outdated Show resolved Hide resolved

robertgshaw2-neuralmagic reviewed

View reviewed changes

vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_wNa16.py Show resolved Hide resolved

dsikka requested review from mgoin and robertgshaw2-neuralmagic

August 9, 2024 17:29

dsikka force-pushed the update_gptq_params branch from 3f77a88 to 94a2233 Compare

August 11, 2024 22:41

mgoin reviewed

View reviewed changes

.buildkite/test-pipeline.yaml Show resolved Hide resolved

tests/weight_loading/run_model_weight_loading_test.sh Show resolved Hide resolved

dsikka requested a review from mgoin

August 12, 2024 20:25

mgoin approved these changes

View reviewed changes

dsikka added 12 commits

August 13, 2024 01:30


          updates gptq_marlin to use new vLLMParameters

c835d44


          remove debug print

8b6d119


          update docstrings; refactor classes

16df553


          remove extra param

6ac99e2


          try adding pipeline tests

1eb8d48


          update

4bcfe5e


          remove extra param

82b5e32


          fix env variable

db65103


          update

7155e57


          try another approach

9691a39


          format fix

fb4e240


          add more models to test with multiple gpu weight loading; PR comments

070dd34

dsikka added 2 commits

August 13, 2024 01:30


          update weight loading tests

dd5733f


          add phi tests

0e8561e

dsikka force-pushed the update_gptq_params branch from 94a2233 to 0e8561e Compare

August 13, 2024 01:31

mgoin merged commit fb377d7 into vllm-project:main

69 checks passed

mgoin deleted the update_gptq_params branch

August 13, 2024 18:30

kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request


          [Misc] Update gptq_marlin to use new vLLMParameters (vllm-project#7281

9550c18

fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request


          [Misc] Update gptq_marlin to use new vLLMParameters (vllm-project#7281

maxdebayser mentioned this pull request

[Bug]: Loading GPTQ-quantized GPTBigCode fails in weight_loader_v2 of qptq_marlin #8116

Closed

1 task

mgoin mentioned this pull request

[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation #6036

Open

3 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready