GPU acceleration for Apple's M1 chip? #47702

chris-dare · 2020-11-10T22:05:54Z

🚀 Feature

Hi,

I was wondering if we could evaluate PyTorch's performance on Apple's new M1 chip. I'm also wondering how we could possibly optimize Pytorch's capabilities on M1 GPUs/neural engines.

I know the issue of supporting acceleration frameworks outside of CUDA has been discussed in previous issues like #488..but I think this is worth a revisit. In Apple's big reveal today, we learned that Apple's on a roll with 50% of product usage growth being as a result of new users this year. Given that Apple is moving to these in-house designed chips, enhanced support for these chips could make deep learning on personal laptops a better experience for many researchers and engineers. I think this really aligns with PyTorch's theme of facilitating deep learning from research to production.

I'm not quite sure how this should go down. But these could be important:

A study on M1 chips
Evaluation of Pytorch's performance on M1 chips
Assessment on M1's compatibility with acceleration frameworks compatible with PyTorch (best bet would be CUDA transpilation..from what I see at OpenCL Support #488)
Investigating enhancements to PyTorch that can take advantage of M1's ML features.

cc @VitalyFedyunin @ngimel

lqf96 · 2020-11-12T01:57:42Z

See also #47688, which is another issue on the same topic.

dbuades · 2020-11-18T22:19:34Z

https://blog.tensorflow.org/2020/11/accelerating-tensorflow-performance-on-mac.html

Something like this in Pytorch would definitely be very cool!

lqf96 · 2020-11-18T22:25:19Z

@dbermond From this blog it seems like PyTorch can also take advantage of the ML Compute Framework which is just added in macOS 11 and iOS 14. However, this is not compatible with older macOS and iOS and you still need to create computation graphs upfront. I'm also wondering if Apple is going to collaborate with Facebook to bring the acceleration to PyTorch.

lanking520 · 2020-11-18T23:27:12Z

It's 2020 now and AMD GPU are still not officially supported by PyTorch... How much percentage of GPU support do you expect to have? Maybe we should focus on how to get PyTorch CPU work on M1 before we jump to GPU...

toshi2k2 · 2020-11-19T02:32:45Z

I don't know but the Pytorch team seems to be exclusively fond of Intel and Nvidia. I have a general dislike for tensorflow but at least they provide some support for AMD and OpenCL(SYCL) - the ways things are going now, I think I'd be going back to TF despite all the drawbacks.

bemoregt · 2020-11-19T08:35:14Z

I hope so !!!

chris-dare · 2020-11-19T15:13:09Z

Perhaps "GPU acceleration" is not the best title for this issue. To be clear, I put that there because it seems to be the most promising component yielding the largest boost....based on the presentation we saw from Apple last week (15x better). That needs verification through tests though.

It could be beneficial to look at the chip itself. The ML Compute Framework seems to suggest that training can take place on both CPU and GPU...although the text, imo, isn't definitive. But here's what it says:

"Accelerate training and validation of neural networks across the CPU and one or more GPUs."

So the unified memory architecture is what Apple says it is, then there's no need to copy data between CPU and GPU. And it's the chip that really becomes the focus. That pretty serious.

chris-dare · 2020-11-19T15:14:45Z

That said, I do believe there will be performance differences between CPU, GPU and Neural engine. I think that's a given.

BramVanroy · 2020-11-19T15:29:36Z

@toshi2k2 Work is being done on HIP (AMD) and a nightly version is already out: #10670 (comment)

toshi2k2 · 2020-11-20T02:01:51Z

@BramVanroy it is being done but its still unstable. And there is still no official support for HIP. Point is, an open source project should not, in all morality, cater to anti open source and monopolistic companies. They can build their own versions of frameworks or accelerators whenever they want (e.g. Apple).

ShawonAshraf · 2020-11-20T21:29:38Z

@BramVanroy it is being done but its still unstable. And there is still no official support for HIP. Point is, an open source project should not, in all morality, cater to anti open source and monopolistic companies. They can build their own versions of frameworks or accelerators whenever they want (e.g. Apple).

Which part of CUDA is open source? Just asking because apart from AMD, everyone has made their accelerators proprietary.

senderle · 2020-11-30T15:48:25Z

I think some of the replies here suggesting that this shouldn't be a priority are pretty myopic. The M1 is not specialty hardware, like a ML-capable GPU. It is standard, everyday, consumer-grade hardware.

It's going to be everywhere.

Any ML framework that doesn't support it is guaranteed to fall behind in usage compared to those that do.

ShawonAshraf · 2020-12-01T00:03:25Z

@ShawonAshraf What are you even talking about? Did you understand that I am talking for open source frameworks? Also, I don't really care about any Apple product - they can build their own frameworks in their little wall garden if they don't want to support open source libraries. For an open source project, open source standards are to be respected so that everyone irrespective of their hardware/product (unless it doesn't support open source applications - e.g. Apple) can use it.

What I understand here is that this kind of perspective is the reason behind frameworks getting handed over to a few vendors. :)

Also, you don't have to care about any product since this isn't a consumer forum asking for reviews of a gadget.

Speaking of walled garden, I'm very intrigued to see which part of CUDA is open source. Afaik, MLCompute, which Apple used to build their private fork of Tensorflow, is also closed behind bars like CUDA. The only Open Source solution out there is ROCm, which PyTorch is reluctant to support and making AMD play catch up.

matdodgson · 2020-12-01T00:23:02Z

The conversation is getting a bit heated. Toshi2k2 is taking the Richard Stallman perspective which is valid. Unfortunately the GPU industry is non free and there's not much you can do about it. If you want to do AI you're stuck with non free licenses.

I for one have a new Apple M1 device and would like to see pytorch support for it.

ShawonAshraf · 2020-12-01T01:24:59Z

@ShawonAshraf What are you even talking about? Did you understand that I am talking for open source frameworks? Also, I don't really care about any Apple product - they can build their own frameworks in their little wall garden if they don't want to support open source libraries. For an open source project, open source standards are to be respected so that everyone irrespective of their hardware/product (unless it doesn't support open source applications - e.g. Apple) can use it.

What I understand here is that this kind of perspective is the reason behind frameworks getting handed over to a few vendors. :)

Also, you don't have to care about any product since this isn't a consumer forum asking for reviews of a gadget.

Speaking of walled garden, I'm very intrigued to see which part of CUDA is open source. Afaik, MLCompute, which Apple used to build their private fork of Tensorflow, is also closed behind bars like CUDA. The only Open Source solution out there is ROCm, which PyTorch is reluctant to support and making AMD play catch up.

Also, this childish "Apple bad" mentality isn't going to help anybody. As someone mentioned above, you have to support commodity hardware. Be it open source or anything else.

ShawonAshraf · 2020-12-01T01:28:31Z

The conversation is getting a bit heated. Toshi2k2 is taking the Richard Stallman perspective which is valid. Unfortunately the GPU industry is non free and there's not much you can do about it. If you want to do AI you're stuck with non free licenses.

I for one have a new Apple M1 device and would like to see pytorch support for it.

Richard Stallman's perspective works great on paper. In practice, people need to get work done and what gets work done on majority of GPUs isn't open source. I'm not against the open source ideology but sometimes, people need to understand that just because they don't eat ice cream on Monday mornings, other people should do the same as well.

ShawonAshraf · 2020-12-01T01:39:57Z

@ShawonAshraf AFAIK, you don't seem to understand anything. Nobody said anything about CUDA being open source at all. Which part of the conversation you got that from? I AM talking about official OpenCL and/or HIP support and why we need it - read the conversation again. Read.

Please read again what I wrote. :) I don't know why you're acting like a angry redditor with brand bias.

I for one would love to see ROCm succeed but at it's current state, it's barely usable for most use cases. And there are a lot of people out there to whom getting the job done matters more than having a fully open source compliant setup and open source ethics. And, that's where ROCm and their HIP approach is useless. Just support vega and polaris GPUs? Tie to a custom kernel only? Really? I guess I read comments similar to yours there as well when someone asked for ROCm to support Windows. And the reply read the same, go make your own or fly kites. Perhaps that's the reason why such projects fail in the long run.

If the world had to run on this Richard Stallmanistic pureview we won't have had a lot of things in existence. :)

soumith · 2020-12-03T05:50:11Z

this conversation is super unproductive.

PyTorch stands for pragmatism. We have finite engineering time, and whatever works best for our users in terms of flexibility, user-friendliness, performance and support is first priority for us. We don't really try to stand for or promote a viral open-source philosophy such as GPL.

We will not do OpenCL, as outlined in OpenCL Support #488 . The reasons are strictly technical -- OpenCL driver latencies are bad and no one commits to fixing them because every vendor has their own walled garden.
We work on supporting AMD ROCm, you can see https://pytorch.org/docs/stable/community/persons_of_interest.html and the number of ROCm-specific Pull Requests to understand that we take it seriously. We haven't announced yet because it isn't ready yet in terms of user-friendliness to our satisfaction. There are unofficial posts such as http://lernapparat.de/pytorch-rocm/ that clearly show how to get going. We will announce when we are ready.
We have supported TPU via XLA and are proactively working with Google in making this better over time.
We integrate with NNAPI / Metal on Mobile, again another specialized computing stack
More hardware vendor supports will be announced as they get ready -- we wont prematurely announce anything
We are not ready to confirm or announce M1 / MLCompute support, please stay tuned. If you are a PyTorch user, you know we will do something if it's worth it and we will do it right.

@toshi2k2 if you actually move on to TensorFlow, and you try out SyCL / ROCm ports of TensorFlow and you are happy with the experience, please do share here. We are pragmatic, not egoistic, we will learn from anyone and anywhere and prioritize to integrate the best things into PyTorch with the finite time we have.

erwincoumans · 2020-12-06T01:56:14Z

Maybe we should focus on how to get PyTorch CPU work on M1 before we jump to GPU...

I just compiled PyTorch pure CPU version from source code, it works fine on M1. Haven't benchmarked yet.

First install miniconda with Python 3.9 and Tensorflow, numpy, scipy etc for M1 using this link:
http://iphonesdkdev.blogspot.com/2020/11/202011.html
and then


git clone --recursive git@github.com:pytorch/pytorch.git

export CMAKE_PREFIX_PATH=/Users/erwincoumans/miniforge3
conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing ninja
pip install typing

MACOSX_DEPLOYMENT_TARGET=11.0 CC=clang CXX=clang++ USE_MKLDNN=0 USE_OPENMP=0 BUILD_CAFFE2=0 python setup.py build develop

That part was easy.

Here is a precompiled wheel in case you are interested:
https://github.com/erwincoumans/pytorch/releases/tag/torch-1.8.0a0-cp39-cp39-macosx_11_0_arm64

In case someone is interested, George Hotz is hacking to get the Neural Engine to work:
https://github.com/geohot/tinygrad/tree/master/ane/1_build
https://www.youtube.com/watch?v=H6ZpMMDvB1M
And some related Neural Engine links: https://github.com/hollance/neural-engine

den-run-ai · 2020-12-06T15:51:54Z

@erwincoumans can you compile with python 3.8 instead of python 3.9? This is what tensorflow-macos is using right now. It would be cool just to have one environment for comparison.

erwincoumans · 2020-12-06T18:53:27Z

@denfromufa the tensorflow-macos is incomplete and doesn't ship with include headers and libraries, so we cannot point CMAKE_PREFIX_PATH to a path. So for now, it is just switching between virtualenv+python3.8+tensorflow-macos and miniconda3+python3.9+pytorch.

It makes sense to use the M1 for inference, converting a PyTorch or TF model to CoreML using https://coremltools.readme.io/docs, that would let you use the neural engine, gpu or cpu.

coremltools can be imported from python 3.9, so converting a pytorch model would work. The tensorflow-macos python 3.8 environment doesn't support coremltools yet, since scipy is not supported at the moment.

SharanSMenon · 2020-12-12T22:42:13Z

Any updates? Will PyTorch add M1 support?

powderluv · 2020-12-22T07:36:08Z

@erwincoumans did you have to disable NNPACK and XNNPACK in your build ?

leovinus2001 · 2020-12-26T23:35:45Z

this conversation is super unproductive.

PyTorch stands for pragmatism. We have finite engineering time, and whatever works best for our users in terms of flexibility, user-friendliness, performance and support is first priority for us. We don't really try to stand for or promote a viral open-source philosophy such as GPL.

We integrate with NNAPI / Metal on Mobile, again another specialized computing stack

We are not ready to confirm or announce M1 / MLCompute support, please stay tuned. If you are a PyTorch user, you know we will do something if it's worth it and we will do it right.

Just looking at some responses above and wondering how to help with PyTorch and macOS MLCompute adoption. Just a few thoughts on "how can the community help?"

Am testing MLCompute in Objective-C at the moment. Just started and it is the usual hassle with ObjC/C++/C but that can be abstracted away to C++-only with PyTorch in mind. The important thing is that you do not need Swift but should be able to make a C++ abstraction layer to simplify integration.
For now, my simple speed testing goal on an iMac is just to do some heavy batch matmul operations with Accelerate/GEMM, MPS/Metal, BNNS, (clBlast) and now MLCompute to see the speed and overhead difference.
At first glance, MLCompute seems a reasonable abstraction and encapsulation of (BNNS/CPU + Metal/MPS/GPU + whatever) just like BNNS used Accelerate. While the argument of "finite engineering resources" is well understood, MLCompute seems like an honest attempt to help PyTorch/TF to adopt something else than CUDA on macOS without any GPU/CPU/M1 detail hassle.
For my point of view, some PyTorch effort to adopt MLCompute instead of raw Metal/MPS seems well worth it on especially macOS, but even iOS.
Question - to facilitate effort estimates on Pytorch + MLCompute, how can we help? Examples?

leovinus2001 · 2021-01-04T01:47:40Z

Someone asked "any updates", and sure a few things were learned on macOS+MLCompute during the last week.
(PS: Happy New Year all ;)
(PS2: Apologies, this is more about MLCompute in ObjectiveC for a potential PyTorch integration, than M1 specific.)

With respect to my last week comment:

Am testing MLCompute in Objective-C at the moment. Just started and it is the usual hassle with ObjC/C++/C but that can be abstracted away to C++-only with PyTorch in mind. The important thing is that you do not need Swift but should be able to make a C++ abstraction layer to simplify integration.

Now, first, I managed to write ObjectiveC code to run a batched MatMul (aka GEMM) layer with MLCompute. The whole graph, inference graph, compile, execute thing with a few deterministic inputs to verify correctness. On CPU and GPU. I concentrated on the whole inference side first for simplicity.

For now, my simple speed testing goal on an iMac is just to do some heavy batch matmul operations with Accelerate/GEMM, MPS/Metal, BNNS, (clBlast) and now MLCompute to see the speed and overhead difference.

Next step, integration of the MLCompute/ObjC with my speed and performance test (in C++) to compare a repeated 1024x1024 MatMul in Accelerate(GEMM), MLCompute, BNNS and Metal/MPS. Both on CPU and and GPU.

First impression on the CPU case: CPU based speed via Accelerate is fastest, say a reference speed of 1.0x. Then, BNNS is roughly 1.1x that speed, MPS/Metal about 0.5x IIRC and MLCompute about 1.5 to 2x slower compared to plain Accelerate. That lead to a bit of checking on the various buffer copies, syncs. My impression was that MLCompute is slower because it is doing more mem-copies in this first prototype.

Therefore, built another test case in ObjectiveC for Apple to verify how many Tensors/buffers are created. On repeated inference, it proves that a new result tensor was generated every time after execute i.e. a new 1024x1024 which certain slows things down a bit. Identified two ways presumably to use a give output tensor and avoids copies and then noticed that I still get a new result tensor every time. Either a bug on MLcompute side or wrong use of MLCompute on my side :)

Just speculating, but maybe this shows up badly only on my Intel iMac as the M1 has a more unified memory where system specific sync shortcuts could make a big difference.

Opened a forum discussion for Apple at https://developer.apple.com/forums/thread/670334 and a proper FeedbackAssistant request FB8957414 with an attached test case. Funnily enough, I had to put it in their CoreML section as there was no MLCompute section yet (that I could see :) The issue is simply that I was unable to use a GIVEN result tensor and avoid extraneous copies of large result tensors. Probably my bad but might be a bug in MLcompute as well.

In summary,
(1) using ObjectiveC MLcompute for a prototype (such as in PyTorch via ObjC) is possible. I see only Swift examples online (ugh) and you need C/C++ interfaces to integrate properly in a C++ PyTorch code repo. There are a set of performance/speed questions though.
(2) sure, raw Accelerate or MPS might be faster but at an enormous price of extra code, testing, QA etc etc, and without NeuralEngine support
(3) MLCompute is new, and a bit "raw" and certainly lacks enough code samples from Apple's side. No blockers though.

Next steps
(1) While waiting for Apple's response, maybe have a look at the source code of TF+MLcompute branch (if available) as that might explain some of the sync/copying issues. If I can get my speed of the CPU MatMul/GEMM closer to Accelerate (a factor of ~1.2) then I would be happier. Same thing for MLcompute speed vs MPS/Metal GEMM.
(2) Maybe file another Apple feedback assistant request to ask for "best practices" of MLCompute +TF for memory sync/performance/speed/neural. They had a machinelearning.apple.com blog post but we need "ObjectiveC lessons learned" from the TF+MLCompute integration for a great PyTorch integration.

Just my 2pc.

Any other suggestions?

amangalampalli · 2022-02-23T03:06:55Z

In advance, I would like to apologize for pinging everyone on this thread.

I apologize for the delay since it has been about 2 weeks since I have heard from Apple, but when asking them if they ever have plans to merge their version of tensorflow-macos and tensorflow-metal with the master branch to give people the ability to build these packages from scratch and to customize their own builds, they responded with this. As someone who has used Tensorflow for quite a long time, this is quite puzzling since custom builds of Tensorflow and most libraries I'm sure would be able to improve performance by compiling for specific instruction sets (AVX2, etc.).

This makes me think that perhaps torch-macos and torch-metal might be in the works since they claim to test libraries extensively, but at the same time, @ct756ui 's points are also extremely valid. Apple still has Google's deal to respect and with Facebook's recent privacy disputes, Apple may never even release an implementation of a GPU-accelerated Pytorch. Either way, it's somewhat safe to say that Apple would probably either release a GPU-accelerated Pytorch either this year, or it probably might not even be in their roadmap at all.

Most of what I said is hypothetical, but the next thing I will be doing is to just ask them in an upfront manner if they ever have plans to release an accelerated Pytorch. However, I would assume that they would not have plans and any changes to implement GPU acceleration would need to be provided through the developments made by the Pytorch team or from us as a community.

jeremy-rutman · 2022-02-26T21:28:13Z

i'lll be trying nod if I can get the time - some sort of way to run pytorch on m1 gpu 'with a few lines of code'

RahulBhalley · 2022-02-27T11:06:42Z

i'lll be trying nod if I can get the time - some sort of way to run pytorch on m1 gpu 'with a few lines of code'

Thanks for sharing. This seems interesting.

briandw · 2022-03-08T22:58:16Z

i'lll be trying nod if I can get the time - some sort of way to run pytorch on m1 gpu 'with a few lines of code'

Have you tried out nod yet? Does it work for training? I only saw benchmarks for inference

jeremy-rutman · 2022-03-08T23:05:21Z

i'lll be trying nod if I can get the time - some sort of way to run pytorch on m1 gpu 'with a few lines of code'

Have you tried out nod yet? Does it work for training? I only saw benchmarks for inference

I didn't get the time yet to try nod but did fool with some training under tensorflow and man that m1 chip is impressive - barely burning 2 watts at full bore at what looks like speed better than my razr blade stealth laptop gpu which sounds like a small jet engine, albeit I've not seen the tf performance benchmarks yet

johnnynunez · 2022-03-09T12:50:44Z

I can't imagine pytorch with m1 ultra with ultrafusion 2.5tb/s. The memory unified can admit large models

phu54321 · 2022-03-09T13:07:09Z

Just waiting for pytorch beta on wwdc 2022

mpottinger · 2022-03-09T23:45:50Z

I can't imagine pytorch with m1 ultra with ultrafusion 2.5tb/s. The memory unified can admit large models

128GB GPU memory on just this gen M1 Ultra, imagine next gen with 256GB GPU Ram. Supporting this platform is a must, it will allow training of models that would previously require multi-GPU hardware not accessible to most people.

SharanSMenon · 2022-03-10T00:19:25Z

Are we close to seeing a public beta release of PyTorch acceleration for macOS? The Mac Studio has a ton of GPU power just waiting to be harnessed.

I also notice this job listing over at Apple: https://jobs.apple.com/en-us/details/200265506/accelerating-pytorch-on-macs-with-bnns

It seems they are looking into accelerating PyTorch with BNNS and the Accelerate Framework.

RahulBhalley · 2022-03-10T11:02:40Z

Are we close to seeing a public beta release of PyTorch acceleration for macOS? The Mac Studio has a ton of GPU power just waiting to be harnessed.

I also notice this job listing over at Apple: https://jobs.apple.com/en-us/details/200265506/accelerating-pytorch-on-macs-with-bnns

It seems they are looking into accelerating PyTorch with BNNS and the Accelerate Framework.

That looks awesome!! Looking at Apple 's history of making TensorFlow closed source worries me that PyTorch from Apple will also be closed. Just binaries.

RahulBhalley · 2022-03-10T15:14:27Z

I can't imagine pytorch with m1 ultra with ultrafusion 2.5tb/s. The memory unified can admit large models

128GB GPU memory on just this gen M1 Ultra, imagine next gen with 256GB GPU Ram. Supporting this platform is a must, it will allow training of models that would previously require multi-GPU hardware not accessible to most people.

Yeah. It's definitely needed. We can't afford to train models on Colab, GCP, or AWS for costly GPUs! M1 chips are just unexpectedly crazy good.

ct756ui · 2022-03-10T16:29:45Z

It seems they are looking into accelerating PyTorch with BNNS and the Accelerate Framework.

This does not make much sense. PyTorch already uses Accelerate and the AMX. Maybe they're looking for lower overhead? The main purpose of preferring a CPU is for low latency when you can't build a graph (RNNs and RL). When you can build a graph, I'm not sure the CPU is commonly faster than the GPU - look at the graphs in the article about SHARK/IREE.

If you can get the GPU driver calls to have extremely low overhead (~1 microsecond per command) in eager mode, then the GPU essentially replaces the CPU. Thus, they would be better off building a Metal backend. But Metal has a driver latency of 10 us per command buffer, and MPSGraph is even worse - 100 us to create an MPSGraph. Maybe they're given up on using Metal for fast dynamically created models.

glenn-jocher · 2022-03-11T00:00:52Z

Exciting news! YOLOv5 inference (but not training) is currently supported on Apple M1 neural engine (all variants). Results show 13X speedup vs CPU on base 2020 M1 Macbook Air --batch-size 1 --imgsz 640:

Results

YOLOv5 🚀 v6.1-25-gcaf7ad0 torch 1.11.0 CPU

YOLOv5s	inference time ^{640x640 image bs1}
PyTorch 1.11.0 CPU	344 ms
CoreML 5.2.0	27 ms

Reproduce

git clone https://github.com/ultralytics/yolov5
cd yolov5
pip install -r requirements.txt  # install (requires python > 3.7)

python export.py --weights yolov5s.pt --include coreml  # export creates yolov5s.mlmodel

python detect.py --weights yolov5s.pt  # PyTorch inference
python detect.py --weights yolov5s.mlmodel  # CoreML inference

EDIT: Results run on battery (95% state of charge). Will re-run tomorrow connected to power.
EDIT2: See YOLOv5 Export Tutorial for additional details.
EDIT3: Connected to power times identical.

ct756ui · 2022-03-11T03:22:12Z

I originally proposed the idea that Apple is collaborating with PyTorch. Now, I’m connecting the dots. PyTorch, your secret is going to be leaked real soon.

I just need to do some final validation of the supporting evidence. I’ll announce my conclusions on this thread ASAP.

kye9216789 · 2022-03-13T16:25:21Z

Actually running simple model using core-ml on m1 is super easy. If you test on smaller image, then it may even outperform a single GTX1080Ti machine.

TeddyHuang-00 · 2022-03-13T17:38:09Z

First, my sincere apology to those whom I'm not targeting to.

But to those who may think issues are where you can discuss anything, this is for you:
This issue is to ask for support for M1 chips, but not the place for you to talk about the chip or anything else that doesn't help the Pytorch team get that done. I think friendly people have already talked about this problem several comments above. It is only a waste of effort trying to PROVE how good it is to have M1 chips supported (as the Pytorch team has been working hard for the past few months). It's also useless to urge the core team to do so as they will bring it to the public when they think it's ready. You need to realize that hundreds of people are tracking this issue. Every time you put valueless comments in this issue, you are literally spamming their mailbox with nonsense. If you have nothing better to do, just go to Reddit or something, and I believe there will be more people willing to talk. PLEASE STOP leaving comments that won't help or are even not related.

albanD · 2022-05-18T15:26:26Z

Hey all,
Thank you very much for your patience while we have been working hard on this. As you might have seen in the blogpost we just published, we are very excited to share with you initial support for MPS.
As of now, you can already install the nightly binaries on your M1 laptop and use the newly added "mps" device to get your code to run on the GPU. You can use that device the exact same way you would use the "cuda" device to speed up your code. Quick note in the doc with more details here.

We are looking forward to your feedback on this new experimental feature!
Don't hesitate to help us:

By reporting bug and sharing feature requests on github: https://github.com/pytorch/pytorch/issues/new/choose
By using the "mps" category on the forum for any question you might have: https://discuss.pytorch.org/c/metal-performance-shader/38

ailzhang added the triage review label Nov 11, 2020

zou3519 added module: performance Issues related to performance, either of kernel code or framework glue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed triage review labels Nov 16, 2020

liyucheng09 mentioned this issue Dec 13, 2020

PyTorch ThatGuySam/doesitarm#432

Closed

This was referenced Mar 9, 2022

Issue with torch.no_grad() on M1 Mac/Apple Silicon pytorch/examples#929

Open

MNIST doesn't work on Mac mini m1 pytorch/examples#915

Open

zhiqwang mentioned this issue Mar 11, 2022

macOS M1上tnn的性能和onnxruntime 相差多大？ Tencent/TNN#1576

Closed

pytorch locked as too heated and limited conversation to collaborators Mar 13, 2022

GeekOfComputer01100101 mentioned this issue Mar 13, 2022

Python MLC API #54347

Closed

azuwis mentioned this issue Mar 14, 2022

not using GPU with M1-chip azuwis/pianotrans#11

Closed

RpvDev mentioned this issue Mar 20, 2022

it doesn't use the apple m1 gpu omerbt/Splice#140

Closed

noahchalifour mentioned this issue Mar 30, 2022

Optimize GPT model server for Apple M1 chip bakhtw1/AI_TEXT_EDITOR_4250#31

Open

oguiza mentioned this issue Apr 10, 2022

Import package kills kernel timeseriesAI/tsai#456

Closed

MaartenGr mentioned this issue May 3, 2022

topic extraction from 'Quick Start' taking forever MaartenGr/BERTopic#510

Closed

thipokKub mentioned this issue May 18, 2022

Some operation are not implemented when using mps backend #77754

Closed

pytorch unlocked this conversation May 18, 2022

pytorch locked as too heated and limited conversation to collaborators May 18, 2022

GPU acceleration for Apple's M1 chip? #47702

GPU acceleration for Apple's M1 chip? #47702

Comments

chris-dare commented Nov 10, 2020 • edited

🚀 Feature

lqf96 commented Nov 12, 2020

dbuades commented Nov 18, 2020

lqf96 commented Nov 18, 2020

lanking520 commented Nov 18, 2020 • edited

toshi2k2 commented Nov 19, 2020

bemoregt commented Nov 19, 2020

chris-dare commented Nov 19, 2020 • edited

chris-dare commented Nov 19, 2020

BramVanroy commented Nov 19, 2020

toshi2k2 commented Nov 20, 2020

ShawonAshraf commented Nov 20, 2020

senderle commented Nov 30, 2020 • edited

ShawonAshraf commented Dec 1, 2020

matdodgson commented Dec 1, 2020 • edited

ShawonAshraf commented Dec 1, 2020

ShawonAshraf commented Dec 1, 2020

ShawonAshraf commented Dec 1, 2020

soumith commented Dec 3, 2020

erwincoumans commented Dec 6, 2020 • edited

den-run-ai commented Dec 6, 2020

erwincoumans commented Dec 6, 2020 • edited

SharanSMenon commented Dec 12, 2020 • edited

powderluv commented Dec 22, 2020

leovinus2001 commented Dec 26, 2020

leovinus2001 commented Jan 4, 2021

amangalampalli commented Feb 23, 2022

jeremy-rutman commented Feb 26, 2022 • edited

RahulBhalley commented Feb 27, 2022

briandw commented Mar 8, 2022

jeremy-rutman commented Mar 8, 2022 • edited

johnnynunez commented Mar 9, 2022

phu54321 commented Mar 9, 2022

mpottinger commented Mar 9, 2022

SharanSMenon commented Mar 10, 2022

RahulBhalley commented Mar 10, 2022

RahulBhalley commented Mar 10, 2022

ct756ui commented Mar 10, 2022

glenn-jocher commented Mar 11, 2022 • edited

Results

Reproduce

ct756ui commented Mar 11, 2022

kye9216789 commented Mar 13, 2022

TeddyHuang-00 commented Mar 13, 2022

albanD commented May 18, 2022

chris-dare commented Nov 10, 2020 •

edited

lanking520 commented Nov 18, 2020 •

edited

chris-dare commented Nov 19, 2020 •

edited

senderle commented Nov 30, 2020 •

edited

matdodgson commented Dec 1, 2020 •

edited

erwincoumans commented Dec 6, 2020 •

edited

erwincoumans commented Dec 6, 2020 •

edited

SharanSMenon commented Dec 12, 2020 •

edited

jeremy-rutman commented Feb 26, 2022 •

edited

jeremy-rutman commented Mar 8, 2022 •

edited

glenn-jocher commented Mar 11, 2022 •

edited