Skip to content
This repository has been archived by the owner on Jan 4, 2024. It is now read-only.

M1 performance #14

Closed
emmajane1313 opened this issue Jan 17, 2023 · 8 comments
Closed

M1 performance #14

emmajane1313 opened this issue Jan 17, 2023 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@emmajane1313
Copy link

Converting the sd-1.5 model from hugging face with the script on m1 mac (running python3 scripts/hf2pyke.py runwayml/stable-diffusion-v1-5 ~/pyke-diffusers-sd15/ ) and getting this traceback:

  File "/Users/devdesign/.asdf/installs/python/3.10.6/lib/python3.10/site-packages/accelerate/big_modeling.py", line 215, in dispatch_model
    main_device = [d for d in device_map.values() if d not in ["cpu", "disk"]][0]
IndexError: list index out of range
@decahedron1
Copy link
Member

Potentially related to huggingface/accelerate#796. I'll patch hf2pyke to allow bypassing accelerate.

@emmajane1313
Copy link
Author

thanks! should i reclone and try ?

@decahedron1
Copy link
Member

Yes, please pull a40d4f9 and run hf2pyke with --no-accelerate:

python3 scripts/hf2pyke.py --no-accelerate runwayml/stable-diffusion-v1-5 ~/pyke-diffusers-sd15/

@emmajane1313
Copy link
Author

cool model converted! how long does an image usually take to generate? i've had it loading for about 5 minutes, is that normal?

@decahedron1
Copy link
Member

decahedron1 commented Jan 17, 2023

I'm not sure about the performance on M1. float32 generation on CPU takes around 3 minutes on my Ryzen 5600X and up to, like, 10 minutes on a Xeon E5-2680 v3. ONNX Runtime is not as well optimized for ARM as x86.

If you're looking for the fastest performance on M1 I'd recommend you use HuggingFace Diffusers. HuggingFace has been working hard on MPS support and it's probably your best bet. ONNX Runtime's CoreML backend won't do much for Stable Diffusion, and there's not much else I can do besides writing a custom AI runtime (which tbf I am considering given how many issues ORT has given me...)

@oliveiraantoniocc
Copy link

Just got to know this project, also interested on running on M1, I'm curious to know if the maybe the rust backend could provide an added performance, also this implementation looks more flexible than current one provided by Apple.

What are the next steps to start the rust backend? Does the requirements.txt need to be adapted to run optimized on M1?

I got the model converted as well.

❯ python3 scripts/hf2pyke.py --no-accelerate runwayml/stable-diffusion-v1-5 ~/pyke-diffusers-sd15/
....

✨ Your model is ready! ~/pyke-diffusers-sd15

@decahedron1
Copy link
Member

I'm curious to know if the maybe the rust backend could provide an added performance, also this implementation looks more flexible than current one provided by Apple.

I'd be happy to introduce optimizations for M1, however not much of the computation is actually done in Rust, most of it is done by ONNX Runtime. Any optimizations I could do would only shave off maybe a few seconds.

I would be open to replacing ONNX Runtime though...

What are the next steps to start the rust backend? Does the requirements.txt need to be adapted to run optimized on M1?

I'm not sure what you mean by this. requirements.txt is only for the hf2pyke script, it is not used by the Rust code.

@decahedron1 decahedron1 self-assigned this Jan 19, 2023
@decahedron1 decahedron1 added the enhancement New feature or request label Jan 19, 2023
@decahedron1 decahedron1 changed the title Model Conversion Error M1 performance Jan 19, 2023
decahedron1 added a commit that referenced this issue Jan 26, 2023
@decahedron1
Copy link
Member

decahedron1 commented Jan 26, 2023

I added support for the CoreML execution provider in 23a7800. I don't have a Mac so I couldn't do any testing, but from a quick glance it looks like the UNet is mostly comprised of supported operators. Execution should be significantly faster, hopefully under a minute.

To use the CoreML backend, you need to:

If you experience any issues please let me know.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants