Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Replace QuantLlamaMLP with QuantFusedMLP #188

Merged
merged 3 commits into from
Nov 15, 2023

Conversation

younesbelkada
Copy link
Collaborator

@younesbelkada younesbelkada commented Nov 13, 2023

In the context of huggingface/transformers#27411 I wanted to create a more generic class for fused MLP layers. I kept the previous QuantLlamaMLP for backward compatiblity

cc @casper-hansen

This PR also fixed another issue where users face a strange issue whenever they try to perform multi-turn generation.

@casper-hansen
Copy link
Owner

Nice one @younesbelkada! 👍

What do you think about setting a default activation=F.silu? Then we can use QuantFusedMLP internally in AutoAWQ as well when migrating from QuantLlamaMLP.

@younesbelkada
Copy link
Collaborator Author

Sounds great! Happy to address these changes

@younesbelkada
Copy link
Collaborator Author

Done ! LMK what do you think

@casper-hansen
Copy link
Owner

I am accepting this PR for now to unblock further work on the integration with transformers. However, I do intend to run more tests to confirm the last patch for multiple generations does not cause bugs with inference in AutoAWQ.

I might also consider implementing ‘is_transformers’ and default it to True to make sure transformers has explicitly different behavior which is evidently needed in some cases.

@casper-hansen casper-hansen merged commit 3b362c0 into main Nov 15, 2023
@casper-hansen casper-hansen deleted the fix-fused-modules branch December 3, 2023 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants