You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our model uses local attention in some layers (i.e each position can only see the last k=256 tokens in every other layer). We would like to be able to specify this in the config on the model hub.
Motivation
Right now we can't integrate the 1.3B and 2.7B EleutherAI GPT models because local attention is not supported in transformers.
The text was updated successfully, but these errors were encountered:
🚀 Feature request
Our model uses local attention in some layers (i.e each position can only see the last k=256 tokens in every other layer). We would like to be able to specify this in the config on the model hub.
Motivation
Right now we can't integrate the 1.3B and 2.7B EleutherAI GPT models because local attention is not supported in transformers.
The text was updated successfully, but these errors were encountered: