Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SD3 loss weighting #8528

Merged
merged 16 commits into from
Jun 16, 2024
Merged

Implement SD3 loss weighting #8528

merged 16 commits into from
Jun 16, 2024

Conversation

Slickytail
Copy link
Contributor

The loss-weighting schemes for SD3 training were not implemented correctly, causing all of them to be non-functional. I went ahead and implemented the lognorm and cosmap schemes, just by using the density at those timesteps. Potentially, a better approach would be to sample the timestep according to that density in the first place.

The Mode scheme is much harder to implement -- there's a reason that they didn't include an explicit form for the density in the paper (I couldn't find one...), so I put in an error message if you try to use it for now.

@sayakpaul @kashif

@kashif
Copy link
Contributor

kashif commented Jun 13, 2024

thanks @Slickytail checking

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@bghira
Copy link
Contributor

bghira commented Jun 13, 2024

previous code

image

new code

image

@kashif
Copy link
Contributor

kashif commented Jun 13, 2024

@Slickytail can you kindly make style in the root dir of diffusers?

@chenbaiyujason
Copy link

I get the following error:
timesteps = noise_scheduler_copy.timesteps[indices].to(device=model_input.device)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

@sayakpaul
Copy link
Member

@kashif I will review the changes a bit later but could you test the scripts and see if they are running without errors?

@chenbaiyujason
Copy link

This avoids errors
1476 indices = (u * noise_scheduler_copy.config.num_train_timesteps).long()
fix:
indices = (u * noise_scheduler_copy.config.num_train_timesteps).long().to(device="cpu")

@xiaohu2015
Copy link
Contributor

a question about the loss: why here we firstly convert the model prediction to x_0 to compute loss?

@Slickytail
Copy link
Contributor Author

Hi @kashif, it looks like you commited the necessary formatting/style changes, so I'm assuming make style isn't needed from me anymore. Is there anything else that needs a fix in here?

@kashif
Copy link
Contributor

kashif commented Jun 14, 2024

@Slickytail yes let's keep all the u tensors on the CPU as the indices are on the CPU side so there is no need to set the device to gpu

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for these!

@asomoza has confirmed that these are working nicely.

In a follow up PR, I will add your PR in the comments to honor your contributions :)

Will also make it a little utility function and move it to training_utils.py.

Appreciate your help here!

@sayakpaul
Copy link
Member

Okay I can confirm that the failing example tests are only with the latest version of datasets. With the 2.19.1 version, they don't appear. So, will merge the PR. Requesting @lhoestq @albertvillanova to help.

@sayakpaul sayakpaul merged commit 6946fac into huggingface:main Jun 16, 2024
6 of 8 checks passed
@albertvillanova
Copy link
Member

Thanks for the ping, @sayakpaul.

In the latest datasets release, we had to introduce a breaking change for security reasons: huggingface/transformers#31406

Datasets with a Python loading script now require passing trust_remote_code=True to be used

@lhoestq
Copy link
Member

lhoestq commented Jun 17, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants