[Misc] Use torch.compile for GemmaRMSNorm #7642

WoosukKwon · 2024-08-19T01:52:16Z

This PR is a temporary solution to accelerate Gemma models. The PR can be reverted once #7110 is merged.

github-actions · 2024-08-19T01:52:28Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

vllm/model_executor/layers/layernorm.py

youkaichao · 2024-08-20T07:30:58Z

vllm/model_executor/layers/layernorm.py

+        residual: Optional[torch.Tensor] = None,
+    ) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
+        """PyTorch-native implementation equivalent to forward()."""
+        return self.forward_static(self.weight.data, self.variance_epsilon, x,


even if this is a static function, I'm not sure this self would cause problem here.

if you want to be safe, I think you can move this function outside of the class definition.

I think this should be ok since it does not touch the states under self. I also checked that re-compilation does not happen after graph capturing, by monitoring the logs with TORCH_LOGS=guards. Also, the ShareGPT throughput benchmark shows 10~15% improvements.

youkaichao

LGTM as a temporary solution.

[Misc] Use torch.compile for GemmaRMSNorm

786fbbd

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 19, 2024

WoosukKwon requested a review from youkaichao August 19, 2024 01:56

WoosukKwon mentioned this pull request Aug 19, 2024

Release v0.5.5 #7481

Closed

youkaichao reviewed Aug 20, 2024

View reviewed changes

vllm/model_executor/layers/layernorm.py Outdated Show resolved Hide resolved

youkaichao reviewed Aug 20, 2024

View reviewed changes

WoosukKwon added 2 commits August 21, 2024 17:54

Merge branch 'main' into gemma-rms

cfc68b4

Fix

e817b74

WoosukKwon requested a review from youkaichao August 22, 2024 01:05

youkaichao approved these changes Aug 22, 2024

View reviewed changes

WoosukKwon merged commit b3856be into main Aug 22, 2024
39 of 41 checks passed

WoosukKwon deleted the gemma-rms branch August 22, 2024 08:14

omrishiv pushed a commit to omrishiv/vllm that referenced this pull request Aug 26, 2024

[Misc] Use torch.compile for GemmaRMSNorm (vllm-project#7642)

8729e85

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc] Use torch.compile for GemmaRMSNorm #7642

[Misc] Use torch.compile for GemmaRMSNorm #7642

WoosukKwon commented Aug 19, 2024

github-actions bot commented Aug 19, 2024

youkaichao Aug 20, 2024

WoosukKwon Aug 22, 2024

youkaichao left a comment

[Misc] Use torch.compile for GemmaRMSNorm #7642

[Misc] Use torch.compile for GemmaRMSNorm #7642

Conversation

WoosukKwon commented Aug 19, 2024

github-actions bot commented Aug 19, 2024

youkaichao Aug 20, 2024

Choose a reason for hiding this comment

WoosukKwon Aug 22, 2024

Choose a reason for hiding this comment

youkaichao left a comment

Choose a reason for hiding this comment