Skip to content
This repository has been archived by the owner on Feb 25, 2022. It is now read-only.

The conversion script doesn’t work #174

Closed
StellaAthena opened this issue Mar 26, 2021 · 2 comments
Closed

The conversion script doesn’t work #174

StellaAthena opened this issue Mar 26, 2021 · 2 comments
Labels
bug Something isn't working.

Comments

@StellaAthena
Copy link
Member

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Run conversion script
  2. Load results into the HuggingFace transformers library
  3. Feed it a context of 450 tokens and then have it generate another 200
  4. Observe that around the 500th token the coherency falls off a cliff

Expected behavior
Performance should not jump off a cliff

Proposed solution
It appears that the problem is the lack of compatibility between the local attention function used in GPT-Neo and the transformers library. While the transformers library does include models with local attention (longformer, for example) it’s not consistent with how the GPT-2 model is defined in the transformers library.

Screenshots
n/a

Environment (please complete the following information):

  • GPUs: v3-8s, Ti1080s, A100s
  • Configs: any config that has local attention

Additional context
Add any other context about the problem here.

@StellaAthena
Copy link
Member Author

StellaAthena commented Mar 26, 2021

The amazing @patil-suraj and @LysandreJik have a preliminary PR for a HF implementation!

huggingface/transformers#10848

@EleutherAI EleutherAI deleted a comment from mwkrix777 Mar 28, 2021
@StellaAthena StellaAthena reopened this Mar 28, 2021
@StellaAthena
Copy link
Member Author

It's live on HF!

https://huggingface.co/EleutherAI/gpt-neo-2.7B

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working.
Projects
None yet
Development

No branches or pull requests

1 participant