-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How would I do RGB 2 RGB image2image translation with this repo? #51
Comments
I don't understand how the config works. |
I think I figured out how to do this. I'll try training a model tomorrow. |
@adeptflax Can you share your code? |
@Guthman I'm still working on it. I got it to train. I need to test the model. |
I'll publish the code once I get it working |
Can you share your intuition? I have no idea to revise this to work on I2I task. |
The codebase is pretty much spaghetti code. I tried modifying drin, because it was doing something similar to image2image. The way I tried to modify it didn't work. I think I know one of the problems. |
I think I got it working. I only have the first epoch of my model trained. I need to wait for it to finish to know for sure. I'll write a guide and the publish the code I used. |
I had to fix something, but I did seem to have gotten it working. I'll post guide tomorrow if it works well. |
Sorry guys, I procrastinated for a couple of days. I have gotten code to work that can train and run a image2image model. I don't know how it compares to pix2pixHD. I slightly screwed up input data on the dataset I was training on, through I should be able to recover from it without completely restarting training. |
Here it is. Should work. https://github.com/adeptflax/image2image |
@Guthman I saved the model output. and I just used pix2pixHD. Through pix2pixHd doesn't do as good as I need. Do you think random crop would help? |
Maybe using transformers instead of just vggan would work? Maybe it's possible to pretrain on a face dataset? I'm doing stuff with faces. |
I'm trained on 2 RTX 3090s for 2 days I think. So I would have to train for another 6 days, because of training because 512x512 is 4 times larger than 256x256? |
@Guthman what's the resolution of your dataset? |
Do transformer models first pre-train with vqgan and then do training on transformers? |
I wonder what the problem is on #52. |
actually it seems you need to first train to train a vqgan model than you can a train transformer. Maybe that's the the problem with #52. You would first train a model with faceshq_vqgan.yaml and then train a transformer with faceshq_transformer.yaml using the first vqgan model. |
Does the transformer just modify the encodings? |
ok I seem to be correct. In drin they created a depth vqgan and an imagenet vqgan model. So they whole drin goes depth vqgan model -> transformer -> image vqgan model. So basically the drin_transformer.yaml trains a model that converts the depth embeddings into imagenet embeddings. |
I modified the reconstruction code to do x -> y instead of x -> x in my repo. Which isn't correct. |
@Guthman did you set n_embed to 16384 or no? "model.params.n_embed" should be 16384. |
ok I got an image2image transformer working I will submit pull request in the next few days. |
I have 512x512 pixel images I would like to do image2image translation on.
The text was updated successfully, but these errors were encountered: