GPT-Neo/GPT-3 Support #109

minimaxir · 2021-03-28T03:51:53Z

Huggingface is adding PyTorch-based GPT-Neo support via huggingface/transformers#10848

That's just the superlarge models (1.3B and 2.7B). If performance/support is good (since this is the only practical way to get a GPT-3 analogous architecture), I am open to doing the necessary work to add it to aitextgen. (it shouldn't be too much though since the defaults between GPT-2 and GPT-Neo are similar, but will have to add some config metadata)

minimaxir · 2021-03-28T21:45:07Z

Also depends on DeepSpeed and ONNX support, which won't be automatic.

minimaxir · 2021-04-05T04:02:39Z

Since that PR is now merged and there's already blog posts talking about finetuning on GPT Neo, I supposed I'll have to add it at some point.

The 1.5B Neo might be fussy; Ideally someone will train a smaller GPT-Neo for testing.

minimaxir · 2021-04-06T16:55:59Z

Now in released Transformers so can test.

There is a released 125M model comparable to GPT-2's 124M model. Will test if finetuning works out of the box. (it should)

minimaxir · 2021-04-10T22:04:35Z

Due to me being stupid, I hardcoded a lot of GPT2LMHeadModel which unfortunately causes this to not work out of the box.

I probably need to go back to AutoConfig so transformers can infer the correct model.

lvxiaoc · 2021-04-25T23:55:57Z

just curious whether this still support to train GPT-Neo from scratch? like GPT2 in aitextgen does. specifically can it be trained on a nvidia GPU with 8G memory (like 3060Ti)

minimaxir · 2021-04-26T16:54:20Z

So it appears there's a slightly increased memory overhead for training GPT Neo (could also be a function of that it's new and less optimized)

When finetuning the 125M model in Colab it hit about 10GB VRAM so that may not work well on a 8GB VRAM GPU. (although a 3060Ti should support fp16 so it might work with that)

minimaxir added a commit that referenced this issue Apr 11, 2021

Abstractions for GPT-Neo support (#109)

42ff7c0

redthing1 mentioned this issue Apr 21, 2021

try hosting gpt-neo bmchtech/aitg#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT-Neo/GPT-3 Support #109

GPT-Neo/GPT-3 Support #109

minimaxir commented Mar 28, 2021 •

edited

Loading

minimaxir commented Mar 28, 2021

minimaxir commented Apr 5, 2021

minimaxir commented Apr 6, 2021 •

edited

Loading

minimaxir commented Apr 10, 2021

lvxiaoc commented Apr 25, 2021

minimaxir commented Apr 26, 2021

GPT-Neo/GPT-3 Support #109

GPT-Neo/GPT-3 Support #109

Comments

minimaxir commented Mar 28, 2021 • edited Loading

minimaxir commented Mar 28, 2021

minimaxir commented Apr 5, 2021

minimaxir commented Apr 6, 2021 • edited Loading

minimaxir commented Apr 10, 2021

lvxiaoc commented Apr 25, 2021

minimaxir commented Apr 26, 2021

minimaxir commented Mar 28, 2021 •

edited

Loading

minimaxir commented Apr 6, 2021 •

edited

Loading