Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Router Z-loss #151

Merged
merged 3 commits into from
Sep 9, 2024
Merged

Implement Router Z-loss #151

merged 3 commits into from
Sep 9, 2024

Conversation

josejg
Copy link
Collaborator

@josejg josejg commented Sep 9, 2024

What does this PR do?

Z-loss is an additional term that improves the stability of softmax logit inputs by penalizing large values where the float precision is reduced.

Idea was introduced by the ST-MoE Paper - https://arxiv.org/abs/2202.08906. And has been recently used in the OLMoE work - https://arxiv.org/abs/2409.02060.

What issue(s) does this change relate to?

Alternative implementation of #133

Before submitting

  • Have you read the contributor guidelines?
  • Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
  • Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
  • Did you update any related docs and document your change?
    • Unsure where to document things
  • Did you update any related tests and add any new tests related to your change? (see testing)
    • Added a test_moe_forward_backward_with_zloss and test_moe_forward_backward_with_zloss
  • Did you run the tests locally to make sure they pass?
  • Did you run pre-commit on your change? (see the pre-commit section of prerequisites)

@josejg josejg mentioned this pull request Sep 9, 2024
Copy link
Collaborator

@mihir-db mihir-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mihir-db mihir-db merged commit cc7614e into main Sep 9, 2024
3 checks passed
@mihir-db mihir-db deleted the josejg/zloss branch September 9, 2024 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants