-
-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPT-Neo/GPT-3 Support #109
Comments
Also depends on DeepSpeed and ONNX support, which won't be automatic. |
Since that PR is now merged and there's already blog posts talking about finetuning on GPT Neo, I supposed I'll have to add it at some point. The 1.5B Neo might be fussy; Ideally someone will train a smaller GPT-Neo for testing. |
Now in released Transformers so can test. There is a released 125M model comparable to GPT-2's 124M model. Will test if finetuning works out of the box. (it should) |
Due to me being stupid, I hardcoded a lot of I probably need to go back to |
just curious whether this still support to train GPT-Neo from scratch? like GPT2 in aitextgen does. specifically can it be trained on a nvidia GPU with 8G memory (like 3060Ti) |
So it appears there's a slightly increased memory overhead for training GPT Neo (could also be a function of that it's new and less optimized) When finetuning the 125M model in Colab it hit about 10GB VRAM so that may not work well on a 8GB VRAM GPU. (although a 3060Ti should support |
Huggingface is adding PyTorch-based GPT-Neo support via huggingface/transformers#10848
That's just the superlarge models (1.3B and 2.7B). If performance/support is good (since this is the only practical way to get a GPT-3 analogous architecture), I am open to doing the necessary work to add it to aitextgen. (it shouldn't be too much though since the defaults between GPT-2 and GPT-Neo are similar, but will have to add some config metadata)
The text was updated successfully, but these errors were encountered: