Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT-Neo/GPT-3 Support #109

Open
minimaxir opened this issue Mar 28, 2021 · 6 comments
Open

GPT-Neo/GPT-3 Support #109

minimaxir opened this issue Mar 28, 2021 · 6 comments

Comments

@minimaxir
Copy link
Owner

minimaxir commented Mar 28, 2021

Huggingface is adding PyTorch-based GPT-Neo support via huggingface/transformers#10848

That's just the superlarge models (1.3B and 2.7B). If performance/support is good (since this is the only practical way to get a GPT-3 analogous architecture), I am open to doing the necessary work to add it to aitextgen. (it shouldn't be too much though since the defaults between GPT-2 and GPT-Neo are similar, but will have to add some config metadata)

@minimaxir
Copy link
Owner Author

Also depends on DeepSpeed and ONNX support, which won't be automatic.

@minimaxir
Copy link
Owner Author

Since that PR is now merged and there's already blog posts talking about finetuning on GPT Neo, I supposed I'll have to add it at some point.

The 1.5B Neo might be fussy; Ideally someone will train a smaller GPT-Neo for testing.

@minimaxir
Copy link
Owner Author

minimaxir commented Apr 6, 2021

Now in released Transformers so can test.

There is a released 125M model comparable to GPT-2's 124M model. Will test if finetuning works out of the box. (it should)

@minimaxir
Copy link
Owner Author

Due to me being stupid, I hardcoded a lot of GPT2LMHeadModel which unfortunately causes this to not work out of the box.

I probably need to go back to AutoConfig so transformers can infer the correct model.

@lvxiaoc
Copy link

lvxiaoc commented Apr 25, 2021

just curious whether this still support to train GPT-Neo from scratch? like GPT2 in aitextgen does. specifically can it be trained on a nvidia GPU with 8G memory (like 3060Ti)

@minimaxir
Copy link
Owner Author

So it appears there's a slightly increased memory overhead for training GPT Neo (could also be a function of that it's new and less optimized)

When finetuning the 125M model in Colab it hit about 10GB VRAM so that may not work well on a 8GB VRAM GPU. (although a 3060Ti should support fp16 so it might work with that)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants