-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference regression DML 1.10.1->1.11 and higher #483
Comments
After further investigation, I've identified the node where values are misscalculated. Opening model3.onnx in netron.app, around the end of the model there's a Transpose node: This Transpose node has the following properties : Here's the code to repro (this requires the onnx package to be installed - also make sure DirectML.dll 1.11 or 1.12 is active):
Output from CPU:
Output from DirectML:
If you check the input of this Transpose (node 1602), both CPU and DirectML values match. So for some reason the Transpose with |
Thanks for reporting this, we'll try to take a look soon. @martinb35 |
@smk2007 has a pending fix for an upcoming patch release in a few weeks. ⏳ |
@divideconcept this issue has been fixed in DirectML 1.12.1. This fix will also be incorporated into the upcoming onnxruntime-directml 1.16 which is expected to release in the coming weeks. |
@divideconcept I'm curious if DirectML.dll 1.12.1 solved it? (appears the ORT 1.16 release is still delayed...) |
Thanks Sheil for fixing and Robin for verifying. Closing ✅. |
I noticed an inference regression between DML 1.10.1 (and earlier) and DML 1.11 (and later), which causes the inference results to be completely off with some models. I'm not sure what node exactly cause the issue, but here's a complete repro step by step:
pip install onnxruntime-directml
.This will show the following output:
In practice, those values are the expected output.
This will show the following output:
Notice the results are completely different (and in practice, no useable value is produced)
Lib\site-packages\onnxruntime\capi
and rename DirectML.dll to DirectML.bak.This will show the following output:
Notice the results are very close, almost identicals to the CPU output. And in practice the values correspond to what is expected.
With this model, correct values are produced with DirectML 1.8, 1.9, 1.10, 1.10.1, and bad values are produced with DirectML 1.11, 1.12. Note that the final tensor shape is correct, only the values are wrong.
My DirectML device is a GeForce RTX 3090.
The text was updated successfully, but these errors were encountered: