ggml : implementation of xPos RoPE (#441); also extends ggml_rope_bac… #442
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is my implementation of xPos RoPE - based on an implementation provided by https://github.com/syncdoth/RetNet (which differs slightly from Microsoft's torchscale implementation to support iterative/recurrent use).
Note that I decided to extend the existing !is_neox implementation by adding the required parameters, but I did not add a new flag for it (the combination of !is_neox and xpos_scale > 0 is used to distinguish between original RoPE and xPos; for the latter an additional API function ggml_rope_xpos_inplace is provided).
I also extended ggml_rope_back with xPos-specific parameters as well as with the two freq parameters that were already added to the forward version of ggml_rope_custom, but were missing in the backprop.
Code review and help with the missing CUDA port appreciated.