Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How would I do RGB 2 RGB image2image translation with this repo? #51

Open
adeptflax opened this issue May 25, 2021 · 25 comments
Open

How would I do RGB 2 RGB image2image translation with this repo? #51

adeptflax opened this issue May 25, 2021 · 25 comments

Comments

@adeptflax
Copy link

I have 512x512 pixel images I would like to do image2image translation on.

@adeptflax
Copy link
Author

I don't understand how the config works.

@adeptflax adeptflax changed the title How would I would do RGB 2 RGB image2image translation with this repo? How would I do RGB 2 RGB image2image translation with this repo? May 26, 2021
@adeptflax
Copy link
Author

I think I figured out how to do this. I'll try training a model tomorrow.

@Guthman
Copy link

Guthman commented Jun 5, 2021

@adeptflax Can you share your code?

@adeptflax
Copy link
Author

@Guthman I'm still working on it. I got it to train. I need to test the model.

@adeptflax
Copy link
Author

I'll publish the code once I get it working

@1211sh
Copy link

1211sh commented Jun 7, 2021

Can you share your intuition? I have no idea to revise this to work on I2I task.

@adeptflax
Copy link
Author

adeptflax commented Jun 8, 2021

The codebase is pretty much spaghetti code. I tried modifying drin, because it was doing something similar to image2image. The way I tried to modify it didn't work. I think I know one of the problems.

@adeptflax
Copy link
Author

I think I got it working. I only have the first epoch of my model trained. I need to wait for it to finish to know for sure. I'll write a guide and the publish the code I used.

@adeptflax
Copy link
Author

I had to fix something, but I did seem to have gotten it working. I'll post guide tomorrow if it works well.

@adeptflax
Copy link
Author

adeptflax commented Jun 14, 2021

Sorry guys, I procrastinated for a couple of days. I have gotten code to work that can train and run a image2image model. I don't know how it compares to pix2pixHD. I slightly screwed up input data on the dataset I was training on, through I should be able to recover from it without completely restarting training.

@adeptflax
Copy link
Author

Here it is. Should work. https://github.com/adeptflax/image2image

@adeptflax adeptflax reopened this Jun 18, 2021
@adeptflax
Copy link
Author

adeptflax commented Jun 18, 2021

@Guthman @1211sh I don't seem to get that good of results by epoch 36 on around 11,000 training examples. Does it just need to trained for longer or does something need to be changed? Any guesses? My output is faces, hair and eyebrows don't have detail.

@Guthman
Copy link

Guthman commented Jun 19, 2021

I don't remember where I read it (can't find it atm), but I think the authors trained theirs for five days on a V100 or something similar. So I think you have a bit to go. I'm training one for a bit on portrait paintings (~40k images), and although the reconstructions are started to look okay (after 34 epochs I think):

reconstructions_gs-091070_e-000080_b-000750

the validation examples weren't close to acceptable:
vq_val

I basically copied the imagenet config but used a batch size of 8

I switched to StyleGAN2-ADA to finish my current project, but I'll come back to VQGAN.

@adeptflax
Copy link
Author

@Guthman I saved the model output. and I just used pix2pixHD. Through pix2pixHd doesn't do as good as I need. Do you think random crop would help?

@adeptflax
Copy link
Author

Maybe using transformers instead of just vggan would work? Maybe it's possible to pretrain on a face dataset? I'm doing stuff with faces.

@adeptflax
Copy link
Author

I'm trained on 2 RTX 3090s for 2 days I think. So I would have to train for another 6 days, because of training because 512x512 is 4 times larger than 256x256?

@adeptflax
Copy link
Author

@Guthman what's the resolution of your dataset?

@adeptflax
Copy link
Author

adeptflax commented Jun 20, 2021

Do transformer models first pre-train with vqgan and then do training on transformers?

@adeptflax
Copy link
Author

I wonder what the problem is on #52.

@adeptflax
Copy link
Author

actually it seems you need to first train to train a vqgan model than you can a train transformer. Maybe that's the the problem with #52. You would first train a model with faceshq_vqgan.yaml and then train a transformer with faceshq_transformer.yaml using the first vqgan model.

@adeptflax
Copy link
Author

Does the transformer just modify the encodings?

@adeptflax
Copy link
Author

ok I seem to be correct. In drin they created a depth vqgan and an imagenet vqgan model. So they whole drin goes depth vqgan model -> transformer -> image vqgan model. So basically the drin_transformer.yaml trains a model that converts the depth embeddings into imagenet embeddings.

@adeptflax
Copy link
Author

I modified the reconstruction code to do x -> y instead of x -> x in my repo. Which isn't correct.

@adeptflax
Copy link
Author

@Guthman did you set n_embed to 16384 or no? "model.params.n_embed" should be 16384.

@adeptflax
Copy link
Author

ok I got an image2image transformer working I will submit pull request in the next few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants