Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not working on apple silicon (CogVideoX Fun Sampler Implementation) #59

Open
defertoexpertise opened this issue Sep 18, 2024 · 17 comments

Comments

@defertoexpertise
Copy link

!!! Exception during processing !!! unsupported scalarType
Traceback (most recent call last):
File "/Users/user/AI/ComfyUI/execution.py", line 323, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "/Users/user/AI/ComfyUI/execution.py", line 198, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "/Users/user/AI/ComfyUI/execution.py", line 169, in _map_node_over_list
process_inputs(input_dict, i)
File "/Users/user/AI/ComfyUI/execution.py", line 158, in process_inputs
results.append(getattr(obj, func)(**inputs))
File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/nodes.py", line 519, in process
autocast_context = torch.autocast(mm.get_autocast_device(device)) if autocastcondition else nullcontext()
File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 229, in init
dtype = torch.get_autocast_dtype(device_type)
RuntimeError: unsupported scalarType

@defertoexpertise defertoexpertise changed the title Getting error on download module for scalarType? Getting error on the fun sampler implementation node for scalarType? Sep 18, 2024
@kijai
Copy link
Owner

kijai commented Sep 18, 2024

I probably left the fp8 fast mode on, check that and put it to disabled to see if it resolves this. What GPU are you using?

@defertoexpertise
Copy link
Author

No it's disabled, i'm on a macbook, the issue seems to be that autocast isn't supported in any pytorch except nightly (As of a week ago) ... so that autocast to fp16 is breaking things... oddly when i went to nightly i started getting errors that in prompt_embeds=positive.to(dtype).to(device), positive is a .. list and doesn't have a .to on list

@kijai
Copy link
Owner

kijai commented Sep 18, 2024

prompt_embeds=positive.to(dtype).to(device), positive is a .. list and doesn't have a .to on list
Are you using the example workflow?

@defertoexpertise defertoexpertise changed the title Getting error on the fun sampler implementation node for scalarType? Not working on apple silicon (CogVideoX Fun Sampler Implementation) Sep 18, 2024
@defertoexpertise
Copy link
Author

HAHA I had overlooked that CogVideo was using different text nodes than the stock ones, swapped to those, now that passes, however now seems to be breaking as it appears something is hardcoded to use cuda instead of failing back to mps or cpu if cudas not available .. haven't tracked down where yet...

i updated in the pipeline for pipeline_cogvideox.py where you had a hardcoded torch.device("cuda") to
device = torch.device("cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu") but doesn't seem to be the call thats got me hung, and whats odd is i can't find any other hardcoded references to cuda that would break things.

Traceback (most recent call last):
  File "/Users/user/AI/ComfyUI/execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/Users/user/AI/ComfyUI/execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/Users/user/AI/ComfyUI/execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/Users/user/AI/ComfyUI/execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/nodes.py", line 535, in process
    latents = pipe(
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/pipeline_cogvideox_inpaint.py", line 634, in __call__
    self.vae.to(device)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1340, in to
    return self._apply(convert)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 927, in _apply
    param_applied = fn(param)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1326, in convert
    return t.to(
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/cuda/__init__.py", line 310, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled```

@defertoexpertise
Copy link
Author

Strange thing, in that inpainting file if i throw a print to see what device is before it tries to send the vae to a device... the device is set to device = self._execution_device... and then device if i print it is "cuda:0"....

@defertoexpertise
Copy link
Author

Ya i'm not sure where that _execution_device is getting set, even if i hard code that instance of it to "mps" or "cpu" ... it seems somehow it's used elsewhere and its still trying to force things onto cuda... which macs dont have

@kijai
Copy link
Owner

kijai commented Sep 18, 2024

Ya i'm not sure where that _execution_device is getting set, even if i hard code that instance of it to "mps" or "cpu" ... it seems somehow it's used elsewhere and its still trying to force things onto cuda... which macs dont have

I think it defaults to cuda if it can't find it from accelerate... dunno why that wouldn't work, you can try just forcing the execution device to mps though.

@defertoexpertise
Copy link
Author

defertoexpertise commented Sep 18, 2024

if you mean trying to just self._execution_device = "mps" wont work its apparently not allowed.

AttributeError: can't set attribute '_execution_device'...

A bit of digging it seems that diffusers returns the device thats set in _hf_hook in the model... which is returning cuda:0

@kijai
Copy link
Owner

kijai commented Sep 18, 2024

Potentially found the reason: I wasn't calling the enable_model_cpu_offload with a device, so that would make it default to cuda.

@defertoexpertise
Copy link
Author

defertoexpertise commented Sep 18, 2024

yep that solved that issue, so now with 2.6.0-dev pytorch (for the autocast to work in the pipeline)... it doesn't give the device error anymore.... Great catch, optional properties are so easy to overlook in these codebases

So close to it running lol i can feel it! XD

Now the hang is at ...

File "/Users/cc/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 64, in forward
    return super().forward(input)
  File "/Users/cc/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 725, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/Users/cc/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 720, in _conv_forward
    return F.conv3d(
RuntimeError: Input type (float) and bias type (c10::Half) should be the same

Get the feeling the dtype is being not passed somewhere it needs to be for float16

@defertoexpertise
Copy link
Author

defertoexpertise commented Sep 19, 2024

Ya i'm not sure why it seems that conv3d is sometimes a float32... and the input is float16...

this is my setup btw

image

and heres the full trace

!!! Exception during processing !!! Input type (float) and bias type (c10::Half) should be the same
Traceback (most recent call last):
  File "/Users/user/AI/ComfyUI/execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/Users/user/AI/ComfyUI/execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/Users/user/AI/ComfyUI/execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/Users/user/AI/ComfyUI/execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/nodes.py", line 530, in process
    latents = pipe(
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/pipeline_cogvideox_inpaint.py", line 719, in __call__
    _, masked_video_latents = self.prepare_mask_latents(
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/pipeline_cogvideox_inpaint.py", line 340, in prepare_mask_latents
    mask_pixel_values_bs = self.vae.encode(mask_pixel_values_bs)[0]
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 1120, in encode
    z_intermediate = self.encoder(z_intermediate)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 739, in forward
    hidden_states = down_block(hidden_states, temb, None)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 415, in forward
    hidden_states = resnet(hidden_states, temb, zq)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 297, in forward
    hidden_states = self.conv1(hidden_states)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 149, in forward
    output = self.conv(inputs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 69, in forward
    return super().forward(input)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 725, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 720, in _conv_forward
    return F.conv3d(
RuntimeError: Input type (float) and bias type (c10::Half) should be the same

also i saw you mention 0.30.3 is required for diffusers it was on 0.30.2 but upgrading didn't change anything.

@kijai
Copy link
Owner

kijai commented Sep 19, 2024

Diffusers 0.30.3 is required for the official I2V model only, not the "Fun" variant.

Does that work for you btw, or is this only issue with the "Fun" models?

@defertoexpertise
Copy link
Author

defertoexpertise commented Sep 19, 2024

cleared my folder and pulled latest from git repo ... and tested with the 2b models with the respective sampler... got very similar errors but... slightly different

with standard 2b (first in list)...

!!! Exception during processing !!! Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: c10::Half instead.
Traceback (most recent call last):
  File "/Users/user/AI/ComfyUI/execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/Users/user/AI/ComfyUI/execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/Users/user/AI/ComfyUI/execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/Users/user/AI/ComfyUI/execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/nodes.py", line 455, in process
    latents = pipeline["pipe"](
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/pipeline_cogvideox.py", line 607, in __call__
    noise_pred = self.transformer(
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 456, in forward
    hidden_states, encoder_hidden_states = block(
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 131, in forward
    attn_hidden_states, attn_encoder_hidden_states = self.attn1(
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 490, in forward
    return self.processor(
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 1925, in __call__
    hidden_states = F.scaled_dot_product_attention(
RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: c10::Half instead.

with fun 2b...

!!! Exception during processing !!! Input type (float) and bias type (c10::Half) should be the same
Traceback (most recent call last):
  File "/Users/user/AI/ComfyUI/execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/Users/user/AI/ComfyUI/execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/Users/user/AI/ComfyUI/execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/Users/user/AI/ComfyUI/execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/nodes.py", line 641, in process
    latents = pipe(
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/pipeline_cogvideox_inpaint.py", line 718, in __call__
    _, masked_video_latents = self.prepare_mask_latents(
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/pipeline_cogvideox_inpaint.py", line 339, in prepare_mask_latents
    mask_pixel_values_bs = self.vae.encode(mask_pixel_values_bs)[0]
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 1114, in encode
    z_intermediate = self.encoder(z_intermediate)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 733, in forward
    hidden_states = down_block(hidden_states, temb, None)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 409, in forward
    hidden_states = resnet(hidden_states, temb, zq)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 291, in forward
    hidden_states = self.conv1(hidden_states)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 144, in forward
    output = self.conv(inputs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/user/AI/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/cogvideox_fun/autoencoder_magvit.py", line 64, in forward
    return super().forward(input)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 725, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/Users/user/AI/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 720, in _conv_forward
    return F.conv3d(
RuntimeError: Input type (float) and bias type (c10::Half) should be the same

@digvijay7
Copy link

digvijay7 commented Sep 21, 2024

I tried to resolve the above - running the 5B I2V model - it seems to be a deeper issue within the CogVideo diffuser model or in the MPS implementation of pytorch (though I can't be sure).
I am leaving these details here, in case someone picks this up:

  1. The precision sent from the codebase in this repository seems to be working correctly (I was running at float32 precision and all the tensors sent to the underlying model had the same precision)
  2. After the following line of code in the diffuser library: https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py#L1924 the query and key tensors have dtype of float32 whereas the value tensor had a dtype of float16 (which seems to be an issue reading the above)

At point 2, I tried forcing the precision as float32 for all tensors and also forcing them to float16 before the call to: scaled_dot_product_attention. In both cases, my macbook gave an OOO error (I have a 36 GB RAM model).

Might try to set this up on a GPU instance somewhere using an Nvidia card ¯_(ツ)_/¯

@cchance27
Copy link

Well the float32 precision will likely oomw without any bugs or issues, bf16 they show at 16gb (confirmed as it can oom even on T4 colab) they even mention in the colabs that they can oom on 16gb vram and memory, I imagine some of this in this comfy extension is the tensor shuffling around chewing up memory but definitly think it needs to run in fp16 to have a chance of running locally on a 36gb…

keep in mind on Mac’s offloading doesn’t do anything as it’s unified vram/ram we’d have to swap to completely unloading extraneous stuff not just shifting it to cpu

@mptorr
Copy link

mptorr commented Sep 27, 2024

I have a similar issue running the 5B I2V model on MacBookPro M3 Max (128 RAM, Sonoma latest).
Python 3.12.4 (miniconda3), pytorch 2.6.0.dev20240924
This happens regardless of using flags --force-fp16, --force-fp32, --dont-upcast-attention.

RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: c10::Half instead.

Let me know if you'd prefer I open another issue or run a few tests given this machine's memory.
I've also tried brute forcing types as @digvijay7 mentioned above, to no avail.
Any insights are welcome, thanks!

The full error output is:

** Python version: 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 10:07:17) [Clang 14.0.6 ]
** Python executable: /Users/u/miniconda3/bin/python
** ComfyUI Path: /Users/u/ComfyUI
** Log path: /Users/u/ComfyUI/comfyui.log

Prestartup times for custom nodes:
   0.0 seconds: /Users/u/ComfyUI/custom_nodes/rgthree-comfy
   0.3 seconds: /Users/u/ComfyUI/custom_nodes/ComfyUI-Manager

Total VRAM 131072 MB, total RAM 131072 MB
pytorch version: 2.6.0.dev20240924
Forcing FP16.
Set vram state to: SHARED
Device: mps
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention
[Prompt Server] web root: /Users/u/ComfyUI/web
/Users/u/miniconda3/lib/python3.12/site-packages/kornia/feature/lightglue.py:44: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
### Loading: ComfyUI-Manager (V2.51.1)
### ComfyUI Revision: 2727 [fdf37566] | Released on '2024-09-24'

[rgthree] Loaded 42 exciting nodes.
[rgthree] NOTE: Will NOT use rgthree's optimized recursive execution as ComfyUI has changed.

Total VRAM 131072 MB, total RAM 131072 MB
pytorch version: 2.6.0.dev20240924
Forcing FP16.
Set vram state to: SHARED
Device: mps
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/alter-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/model-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/github-stats.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/extension-node-map.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json

Import times for custom nodes:
   0.0 seconds: /Users/u/ComfyUI/custom_nodes/websocket_image_save.py
   0.0 seconds: /Users/u/ComfyUI/custom_nodes/rgthree-comfy
   0.0 seconds: /Users/u/ComfyUI/custom_nodes/ComfyUI-KJNodes
   0.0 seconds: /Users/u/ComfyUI/custom_nodes/ComfyUI-Manager
   0.1 seconds: /Users/u/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper
   0.2 seconds: /Users/u/ComfyUI/custom_nodes/ComfyUI-VideoHelperSuite

Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
Encoded latents shape: torch.Size([1, 1, 16, 60, 90])
/Users/u/miniconda3/lib/python3.12/site-packages/transformers/tokenization_utils_base.py:1617: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be deprecated in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Requested to load SD3ClipModel_
Loading 1 new model
loaded completely 0.0 4541.693359375 True
Temporal tiling disabled
  0%|          | 0/50 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
  0%|          | 0/50 [00:00<?, ?it/s]
!!! Exception during processing !!! Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: c10::Half instead.
Traceback (most recent call last):
  File "/Users/u/ComfyUI/execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/ComfyUI/execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/ComfyUI/execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/Users/u/ComfyUI/execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/nodes.py", line 843, in process
    latents = pipeline["pipe"](
              ^^^^^^^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/pipeline_cogvideox.py", line 615, in __call__
    noise_pred = self.transformer(
                 ^^^^^^^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 456, in forward
    hidden_states, encoder_hidden_states = block(
                                           ^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 131, in forward
    attn_hidden_states, attn_encoder_hidden_states = self.attn1(
                                                     ^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/diffusers/models/attention_processor.py", line 490, in forward
    return self.processor(
           ^^^^^^^^^^^^^^^
  File "/Users/u/miniconda3/lib/python3.12/site-packages/diffusers/models/attention_processor.py", line 1925, in __call__
    hidden_states = F.scaled_dot_product_attention(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: c10::Half instead.

@BenRacicot
Copy link

Can confirm this is an issue on M2 Max chips.
pytorch/pytorch#110285 just gonna leave this here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants