-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Smaller flux checkpoints? #128
Comments
So far I am getting an error: [2024-09-24 21:36:12] [INFO] Running D:\pinokio\api\fluxgym.git\outputs\juna\train.bat |
In your log, it seems like the model keys are matched successfully, but then accelerate errors out when the process tries to proceed with splitting the model (per the '--split_mode' argument). But the real issue probably has to do with GPU overhead anyhow, since before the training even starts, the training scripts try to somehow get the backends to juggle not just the model itself (the transformer/"unet" safetensors) but simultaneously the huge T5XXL text encoder and the smaller Clip text encoder and also the Vae, regardless of whether or not you're trying to train any of these components. And the quantization and the size of those components really matters as well. The fp16 T5XXL is nearly 10GB in its own right. The fp8 version, as used by fluxgym, is nearly 5GB. But when it comes to whether or not the training script even gets to the point of "just" training the model's transformer/"unet" (where slow train on 8GB VRAM may becomes plausible), none of the Flux training scripts or frameworks thus far seem to have sufficiently adaptive cpu offloading mechanisms or other internal workarounds to, say, function half as reliably as the average inference framework. It's been quite maddening actually. |
Is it possible to run fluxgym on a smaller model? I don't use fp16 because of having only 8gb VRAM. I tried to put my flux model into the fluxgym models directory, but I receive an error no. 1.
The text was updated successfully, but these errors were encountered: