Local Attention for GPT2 #10834

leogao2 · 2021-03-21T16:35:54Z

🚀 Feature request

Our model uses local attention in some layers (i.e each position can only see the last k=256 tokens in every other layer). We would like to be able to specify this in the config on the model hub.

Motivation

Right now we can't integrate the 1.3B and 2.7B EleutherAI GPT models because local attention is not supported in transformers.

LysandreJik · 2021-03-22T13:44:34Z

@patil-suraj is working on implementing GPT Neo over at #10848! The 1.3B and 2.7B should be loadable in that architecture once finalized.

patil-suraj closed this as completed Mar 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local Attention for GPT2 #10834

Local Attention for GPT2 #10834

leogao2 commented Mar 21, 2021 •

edited

Loading

LysandreJik commented Mar 22, 2021

Local Attention for GPT2 #10834

Local Attention for GPT2 #10834

Comments

leogao2 commented Mar 21, 2021 • edited Loading

🚀 Feature request

Motivation

LysandreJik commented Mar 22, 2021

leogao2 commented Mar 21, 2021 •

edited

Loading