Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error first try SD3 directml RX580 #3689

Open
KillyTheNetTerminal opened this issue Jun 12, 2024 · 23 comments
Open

Error first try SD3 directml RX580 #3689

KillyTheNetTerminal opened this issue Jun 12, 2024 · 23 comments

Comments

@KillyTheNetTerminal
Copy link

Error occurred when executing KSampler:

Expected all tensors to be on the same device, but found at least two devices, privateuseone:0 and cpu!

File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\execution.py", line 151, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\execution.py", line 81, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\execution.py", line 74, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\nodes.py", line 1355, in sample
return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\nodes.py", line 1325, in common_ksampler
samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\sample.py", line 43, in sample
samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 794, in sample
return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 696, in sample
return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 683, in sample
output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 662, in inner_sample
samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 567, in sample
samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comf\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\k_diffusion\sampling.py", line 137, in sample_euler
denoised = model(x, sigma_hat * s_in, **extra_args)
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 291, in call
out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 649, in call
return self.predict_noise(*args, **kwargs)
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 652, in predict_noise
return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 277, in sampling_function
out = calc_cond_batch(model, conds, x, timestep, model_options)
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\samplers.py", line 226, in calc_cond_batch
output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\model_base.py", line 103, in apply_model
model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comf\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comf\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\ldm\modules\diffusionmodules\mmdit.py", line 961, in forward
return super().forward(x, timesteps, context=context, y=y)
File "C:\Users\WarMa\OneDrive\Escritorio\SD\comfyuai\ComfyUI\comfy\ldm\modules\diffusionmodules\mmdit.py", line 937, in forward
x = self.x_embedder(x) + self.cropped_pos_embed(hw, device=x.device).to(dtype=x.dtype)
imagen_2024-06-12_085506430

@timesqueezer
Copy link

timesqueezer commented Jun 12, 2024

Can confirm the same error on a RTX 3050 / Intel Core i7-11800H notebook. The only difference is this line:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

@KillyTheNetTerminal
Copy link
Author

exactly cause you have Nvidea and Cuda

@KillyTheNetTerminal
Copy link
Author

image
working on CPU but slow as hell. i3-9100f

@Cremesis
Copy link

Same problem for me

Exception during processing!!! Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Traceback (most recent call last):
  File "C:\Users\<myuser>\Downloads\comfyui\ComfyUI\execution.py", line 151, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)

@jtyszkiew
Copy link

Seems to work for me on:

│Total VRAM 11980 MB, total RAM 64140 MB
│pytorch version: 2.3.0+cu121
│Set vram state to: NORMAL_VRAM
│Device: cuda:0 NVIDIA GeForce RTX 4070 : cudaMallocAsync
│VAE dtype: torch.bfloat16
│Using pytorch cross attention

Nvidia + CUDA

@kuldp18
Copy link

kuldp18 commented Jun 12, 2024

Can confirm the same error on a RTX 3050 / Intel Core i7-11800H notebook. The only difference is this line: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

I have the same error too..

@15Litrov
Copy link

15Litrov commented Jun 12, 2024

A fresh manual install with nightly pytorch (other not tested) helped me overcome this problem.
1050ti 4gb + 32gb RAM

@kuldp18
Copy link

kuldp18 commented Jun 12, 2024

A fresh manual install with nightly pytorch (other not tested) helped me overcome this problem. 1050ti 4gb + 32gb RAM

can we just update the pytorch in the current install? and how is 4gb vram handling sd3 btw?

@15Litrov
Copy link

A fresh manual install with nightly pytorch (other not tested) helped me overcome this problem. 1050ti 4gb + 32gb RAM

can we just update the pytorch in the current install? and how is 4gb vram handling sd3 btw?

Maybe? I did not test.
About performance: 30 s/it for 1024x1024 with dualCLIP.

@kuldp18
Copy link

kuldp18 commented Jun 12, 2024

Guys the issue is fixed, please do an update!

@AlexBenjarmin
Copy link

Guys the issue is fixed, please do an update!

update Comfy UI?

@KillyTheNetTerminal
Copy link
Author

yes it works now, update comfyui (I use manager) very slow per it. There's is a way to speed up this?

@KillyTheNetTerminal
Copy link
Author

image

@ltdrdata
Copy link
Contributor

ltdrdata commented Jun 12, 2024

image

You should not use dpmpp_2m, karras.
Just use euler, sgm_uniform.

karras is bad for SD3.

@kuldp18
Copy link

kuldp18 commented Jun 12, 2024

image

You should not use dpmpp_2m, karras. Just use euler, sgm_uniform.

karras is bad for SD3.

the official recommendation is dpm though, isn't euler too random according to sd3 architecture?

@ltdrdata
Copy link
Contributor

image

You should not use dpmpp_2m, karras. Just use euler, sgm_uniform.
karras is bad for SD3.

the official recommendation is dpm though, isn't euler too random according to sd3 architecture?

https://comfyanonymous.github.io/ComfyUI_examples/sd3/

Official example is suggesting euler, sgm_uniform.
In my test.
dpmpp_2m sampler is ok.
but the scheduler must be one of normal, simple, sgm_uniform, ddim_uniform.

@KillyTheNetTerminal
Copy link
Author

the same, the image is still noisy

@KillyTheNetTerminal
Copy link
Author

image

@ltdrdata
Copy link
Contributor

image

Try on cpu mode.

@Wallboy
Copy link

Wallboy commented Jun 13, 2024

Same issue with just getting noisy generated images. 7900 XTX also running using DirectML.

Perhaps SD3 is not working with AMD GPUs/DirectML yet.

@kopaser6463
Copy link

Same issue, i nail it down a little bit to variable named out in sampling_function in samplers.py being different on cpu/directml.
Here a crazy path to it.
nodes.py -> samplers.py -> KSampler.sample -> sample(diferent one) -> CFGGuider.sample ->CFGGuider.inner_sample (sampler.sample(self, sigmas...)) -> sampler = sampler_object(self.sampler just a name) -> sampler_object -> ksampler -> KSAMPLER.sample(self, model_wrap, sigmas, extra_args...) -> model_k = KSamplerX0Inpaint(model_wrap, sigmas) ->
model_wrap is self in sampler.sample so CFGGuider() call return self.predict_noise() -> sampling_function(model) -> cfg_function(model) -> out.
It is different on cpu/directml.
Why? I don't know.

@Wallboy
Copy link

Wallboy commented Jun 13, 2024

If anyone wants to get a working SD3 with AMD GPUs in the mean time, look up the ComfyUI Zluda fork and use that instead. Working great.

Just be warned that the first generation takes a little while as a bunch of databases are being processed. Similar to if you've ever used A1111 and Zluda, you had that same wait time for your first generation after installing it.

@KillyTheNetTerminal
Copy link
Author

I never try and set up zluda for comfyui, this speed ups generations? compared to directml? how can I test this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants