-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complex multiplication returns different results if wrapped in np array #26740
Comments
NumPy only guarantees 4 ULP precision for floating point results, so you can't expect NumPy's results to always agree with the results of another math library. That said, it looks like the result you have is 8 ulp off:
Ping @r-devulap is my math right? Something else going on? |
Interesting, mind providing the reference? |
For multiplication, I would think we do better than 4 ULP, but really the relevant check one needs to do is,
This is relevant since in the end one does Note that I had expected to find some way by reordering operations to reproduce the difference, but so far I failed; it does seem to have something to do with the different implementation for array and scalars. |
For reference, the implementation certainly looks bog-standard: numpy/numpy/_core/src/umath/loops.c.src Lines 2035 to 2046 in b8d1012
But possibly I'm missing some vectorization or so. Scalars are done at numpy/numpy/_core/src/umath/scalarmath.c.src Lines 432 to 438 in b8d1012
|
I'm not sure there is one, maybe there should be somewhere. It came along with adopting the Intel SVML library for writing SIMD-optimized loops. If you search for issues containing the phrase "4 ulp" you'll see other developers talking about this over the years. |
Note that neither of the displayed real parts of the computed values are the true expected real value. The best value is
The observed value
|
If it can help in any way, the As an addition, I've been told that |
The 4 ULP SVML thing is partially a thing of the past unless I am confused: we did run into issues with sin/cos and eventually switched the highe rprecision version since it wasn't that much slower anyway. In either case the true number of ULPs, depends on the system math libraries and the functions, etc. in either case, so some functions may well have less than 4 ULPs! FWIW, we do have SIMD implementations for complex multiplication in So given that as Warren said the less precise version is the scalar one that has existed forever, I don't think there is anything to do here. |
I don't mind if we try to use the SIMD fused ops also for scalars, but we don't get to a point where results between Python and numpy or comparing different platforms will be identical that way. |
Interesting that the SIMD one is a bit more precise. Agree that we cannot do much here - given the numbers involved, this is a 1 ULP difference. |
Agreed, this is correct. We decided that 4 ULP is not acceptable by default, and switched back. Also, 4 ULP was only ever the extreme of a distribution of errors, not a regularly expected difference. Looks like we can close this? |
Thanks for clarifying the math and apologies for my misremembering our policy. |
Describe the issue:
I'm trying to restrict the problem, but for now it seems that with newer numpy versions on x64 certain complex products return different results depending on whether the operands are wrapped in a numpy array or not. Example code and some results I got below:
My machine:
Google Colab:
Some passing results:
Some online REPL:
Reproduce the code example:
Error message:
No response
Python and NumPy Versions:
2.0.0
3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
Runtime Environment:
[{'numpy_version': '2.0.0',
'python': '3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]',
'uname': uname_result(system='Linux', node='Luchino', release='5.15.0-112-generic', version='#122-Ubuntu SMP Thu May 23 07:48:21 UTC 2024', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_KNL',
'AVX512_KNM',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}},
{'architecture': 'Zen',
'filepath': '/home/svalorzen/Tests/python/venv/lib/python3.10/site-packages/numpy.libs/libscipy_openblas64_-99b71e71.so',
'internal_api': 'openblas',
'num_threads': 32,
'prefix': 'libscipy_openblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.27'}]
Context for the issue:
I am porting some Python code to C++, and it is quite important that the output stays consistent. In my case, my C++ code produces the result of the "normal" Python multiplication, while the Python code uses numpy arrays and so generates the other result. This breaks my tests, alongside my sanity.
The text was updated successfully, but these errors were encountered: