M1 performance #14

emmajane1313 · 2023-01-17T18:32:42Z

Converting the sd-1.5 model from hugging face with the script on m1 mac (running python3 scripts/hf2pyke.py runwayml/stable-diffusion-v1-5 ~/pyke-diffusers-sd15/ ) and getting this traceback:

  File "/Users/devdesign/.asdf/installs/python/3.10.6/lib/python3.10/site-packages/accelerate/big_modeling.py", line 215, in dispatch_model
    main_device = [d for d in device_map.values() if d not in ["cpu", "disk"]][0]
IndexError: list index out of range

The text was updated successfully, but these errors were encountered:

decahedron1 · 2023-01-17T18:43:46Z

Potentially related to huggingface/accelerate#796. I'll patch hf2pyke to allow bypassing accelerate.

emmajane1313 · 2023-01-17T18:58:25Z

thanks! should i reclone and try ?

decahedron1 · 2023-01-17T19:09:07Z

Yes, please pull a40d4f9 and run hf2pyke with --no-accelerate:

python3 scripts/hf2pyke.py --no-accelerate runwayml/stable-diffusion-v1-5 ~/pyke-diffusers-sd15/

emmajane1313 · 2023-01-17T19:34:38Z

cool model converted! how long does an image usually take to generate? i've had it loading for about 5 minutes, is that normal?

decahedron1 · 2023-01-17T19:46:27Z

I'm not sure about the performance on M1. float32 generation on CPU takes around 3 minutes on my Ryzen 5600X and up to, like, 10 minutes on a Xeon E5-2680 v3. ONNX Runtime is not as well optimized for ARM as x86.

If you're looking for the fastest performance on M1 I'd recommend you use HuggingFace Diffusers. HuggingFace has been working hard on MPS support and it's probably your best bet. ONNX Runtime's CoreML backend won't do much for Stable Diffusion, and there's not much else I can do besides writing a custom AI runtime (which tbf I am considering given how many issues ORT has given me...)

oliveiraantoniocc · 2023-01-17T20:02:38Z

Just got to know this project, also interested on running on M1, I'm curious to know if the maybe the rust backend could provide an added performance, also this implementation looks more flexible than current one provided by Apple.

What are the next steps to start the rust backend? Does the requirements.txt need to be adapted to run optimized on M1?

I got the model converted as well.

❯ python3 scripts/hf2pyke.py --no-accelerate runwayml/stable-diffusion-v1-5 ~/pyke-diffusers-sd15/
....

✨ Your model is ready! ~/pyke-diffusers-sd15

decahedron1 · 2023-01-17T20:13:17Z

I'm curious to know if the maybe the rust backend could provide an added performance, also this implementation looks more flexible than current one provided by Apple.

I'd be happy to introduce optimizations for M1, however not much of the computation is actually done in Rust, most of it is done by ONNX Runtime. Any optimizations I could do would only shave off maybe a few seconds.

I would be open to replacing ONNX Runtime though...

What are the next steps to start the rust backend? Does the requirements.txt need to be adapted to run optimized on M1?

I'm not sure what you mean by this. requirements.txt is only for the hf2pyke script, it is not used by the Rust code.

decahedron1 · 2023-01-26T00:51:37Z

I added support for the CoreML execution provider in 23a7800. I don't have a Mac so I couldn't do any testing, but from a quick glance it looks like the UNet is mostly comprised of supported operators. Execution should be significantly faster, hopefully under a minute.

To use the CoreML backend, you need to:

build ONNX Runtime from source with CoreML support - their docs are lacking, so use their standard build guide and pass the --use_coreml flag to build.sh
point ort to your binaries - set the ORT_STRATEGY=system env variable, and use ORT_LIB_LOCATION to point to your binaries when building/running diffusers
enable the ort-coreml Cargo feature in diffusers
create your pipeline with devices: DiffusionDeviceControl::All(DiffusionDevice::CoreML)

If you experience any issues please let me know.

decahedron1 added a commit that referenced this issue Jan 17, 2023

fix: option to disable accelerate, ref #14

a40d4f9

decahedron1 self-assigned this Jan 19, 2023

decahedron1 added the enhancement New feature or request label Jan 19, 2023

decahedron1 changed the title ~~Model Conversion Error~~ M1 performance Jan 19, 2023

decahedron1 added a commit that referenced this issue Jan 26, 2023

feat: CoreML, ref #14

23a7800

decahedron1 closed this as completed Jan 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

M1 performance #14

M1 performance #14

emmajane1313 commented Jan 17, 2023

decahedron1 commented Jan 17, 2023

emmajane1313 commented Jan 17, 2023

decahedron1 commented Jan 17, 2023

emmajane1313 commented Jan 17, 2023

decahedron1 commented Jan 17, 2023 •

edited

Loading

oliveiraantoniocc commented Jan 17, 2023

decahedron1 commented Jan 17, 2023

decahedron1 commented Jan 26, 2023 •

edited

Loading

M1 performance #14

M1 performance #14

Comments

emmajane1313 commented Jan 17, 2023

decahedron1 commented Jan 17, 2023

emmajane1313 commented Jan 17, 2023

decahedron1 commented Jan 17, 2023

emmajane1313 commented Jan 17, 2023

decahedron1 commented Jan 17, 2023 • edited Loading

oliveiraantoniocc commented Jan 17, 2023

decahedron1 commented Jan 17, 2023

decahedron1 commented Jan 26, 2023 • edited Loading

decahedron1 commented Jan 17, 2023 •

edited

Loading

decahedron1 commented Jan 26, 2023 •

edited

Loading