Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] A method to resume training with different batch size while keeping your G epoch and nkimg value. #92

Open
nom57 opened this issue Sep 8, 2022 · 3 comments

Comments

@nom57
Copy link

nom57 commented Sep 8, 2022

on SG2 and SG3 given you use them on a modified fork you can resume training on a completely different batch size and still keep your Tick / Nkimg progress , by specifying it with the kwarg ""--nkimg"" example --nkimg=2500 resumes training with an assumed 2500kimg progress.

SGXL resets kimg to 0 if you change the batch size

I found its extremely useful to start with a very low batch size such as --batch=2 --glr=0.0008 --dlr=0.0006 to improve diversity and then swittch to a batch size of 32 / 64 / 128 for better FID when im starting to bottom out FID with batch=2 ,

however because the Augmentation , the G epoch and kimg resets in SGXL when doing this , I am having a really bad time.

@nom57
Copy link
Author

nom57 commented Sep 8, 2022

this method of training can make recall be 0.7+ instead of 0.5 on many datasets while still reaching the same FID (after also bottoming out FID on 128 batch size) so its tremendously better recall with this method. with batch=2 it also converges much faster so the first part of training focusing on diversity is very fast.

@nom57
Copy link
Author

nom57 commented Sep 8, 2022

for example the pokemon dataset can have a recall of 0.787 at 64x64 with this method @xl-sr
so an -nkimg resume feature would be tremendously helpful

@nom57
Copy link
Author

nom57 commented Sep 9, 2022

update :

this seems to favor SG2-Ada way more than SGXL , SGXL can easily have collapses with low batch sizes , so its hard to tame , but still , a batch size of 2-16 for the first 144kimg and then switching to a batch size of 64 or 128 , seems beneficial on Unimodal datasets for better recall and faster training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant