Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add minimum capability requirement for AWQ #1064

Merged
merged 2 commits into from
Sep 18, 2023
Merged

Add minimum capability requirement for AWQ #1064

merged 2 commits into from
Sep 18, 2023

Conversation

WoosukKwon
Copy link
Collaborator

Closes #1063

@esmeetu
Copy link
Collaborator

esmeetu commented Sep 17, 2023

@WoosukKwon I think setup.py file might need add compute capability check whether to build quant kernel or not.

@WoosukKwon
Copy link
Collaborator Author

@esmeetu Thanks for the suggestion! I've instead added a guard to prevent compilation for the supported GPUs.

Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

Comment on lines +14 to +15
namespace vllm {
namespace awq {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this namespace needed?

Copy link
Collaborator Author

@WoosukKwon WoosukKwon Sep 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are optional, but for better coding convention. From google cpp style guide:

With few exceptions, place code in a namespace.

Namespace prevents naming conflicts, so it's pretty useful for external code like the AWQ kernels.

Comment on lines +71 to +78
capability = torch.cuda.get_device_capability()
capability = capability[0] * 10 + capability[1]
if capability < quant_config.get_min_capability():
raise ValueError(
f"The quantization method {model_config.quantization} is not "
"supported for the current GPU. "
f"Minimum capability: {quant_config.get_min_capability()}. "
f"Current capability: {capability}.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the assert false in C++ if we have the check here?

Copy link
Member

@zhuohan123 zhuohan123 Sep 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw the comments in the PR. Can we just change setup.py instead of the C++ files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's problematic when we want to build the wheel for all GPU architectures (e.g., for pypi publication or building docker image). In such a case, we cannot selectively include the extension according to the architecture. Therefore, I believe this is an easier solution, and in fact we already used this kind of guard for bfloat16 attention kernels, which do not support Turing and Volta GPUs.

Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fix!

@WoosukKwon WoosukKwon merged commit 2b1c116 into main Sep 18, 2023
2 checks passed
@WoosukKwon WoosukKwon deleted the awq-warning branch September 18, 2023 19:02
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
sjchoi1 pushed a commit to casys-kaist-internal/vllm that referenced this pull request May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWQ does not support Turing GPUs
3 participants