Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complex multiplication returns different results if wrapped in np array #26740

Open
Svalorzen opened this issue Jun 18, 2024 · 12 comments
Open
Labels
33 - Question Question about NumPy usage or development

Comments

@Svalorzen
Copy link

Describe the issue:

I'm trying to restrict the problem, but for now it seems that with newer numpy versions on x64 certain complex products return different results depending on whether the operands are wrapped in a numpy array or not. Example code and some results I got below:

My machine:

1.18920906599179332e+00 3.77425653250409567e+01
1.18920906599179510e+00 3.77425653250409567e+01
False
SYS VERSION 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
PLATFORM x86_64
NP VERSION 2.0.0

Google Colab:

1.18920906599179332e+00 3.77425653250409567e+01
1.18920906599179510e+00 3.77425653250409567e+01
False
SYS VERSION 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
PLATFORM x86_64
NP VERSION 1.25.2

Some passing results:

Python 3.9.6 / NumPy 2.0 / arm64 (Apple Silicon):
1.18920906599179510e+00 3.77425653250409567e+01
1.18920906599179510e+00 3.77425653250409567e+01
True
Python 3.9.6 / NumPy 2.0.0 / x86_64 (Rosetta emulation under Apple Silicon):
1.18920906599179332e+00 3.77425653250409567e+01
1.18920906599179332e+00 3.77425653250409567e+01
True

Some online REPL:

1.18920906599179332e+00 3.77425653250409567e+01
1.18920906599179332e+00 3.77425653250409567e+01
True
SYS VERSION 3.8.2 (default, Mar 13 2020, 10:14:16)  [GCC 9.3.0]
PLATFORM x86_64
NP VERSION 1.18.2

Reproduce the code example:

import platform
import numpy as np
import sys
def ff(x):
    return f"{x:.17e}"

x = (-0.7173752049805099+0.6966870282122177j)
y = (25.441646575927734-27.90408706665039j)
xy = x * y
print(ff(xy.real), ff(xy.imag))
xxyy = (np.array([x]) * np.array([y]))[0]
print(ff(xxyy.real), ff(xxyy.imag))
print(xy == xxyy)
print("SYS VERSION", sys.version)
print("PLATFORM", platform.processor())
print("NP VERSION", np.version.version)

Error message:

No response

Python and NumPy Versions:

2.0.0
3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]

Runtime Environment:

[{'numpy_version': '2.0.0',
'python': '3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]',
'uname': uname_result(system='Linux', node='Luchino', release='5.15.0-112-generic', version='#122-Ubuntu SMP Thu May 23 07:48:21 UTC 2024', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_KNL',
'AVX512_KNM',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}},
{'architecture': 'Zen',
'filepath': '/home/svalorzen/Tests/python/venv/lib/python3.10/site-packages/numpy.libs/libscipy_openblas64_-99b71e71.so',
'internal_api': 'openblas',
'num_threads': 32,
'prefix': 'libscipy_openblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.27'}]

Context for the issue:

I am porting some Python code to C++, and it is quite important that the output stays consistent. In my case, my C++ code produces the result of the "normal" Python multiplication, while the Python code uses numpy arrays and so generates the other result. This breaks my tests, alongside my sanity.

@ngoldbaum
Copy link
Member

NumPy only guarantees 4 ULP precision for floating point results, so you can't expect NumPy's results to always agree with the results of another math library. That said, it looks like the result you have is 8 ulp off:

In [10]: sympy.N(x*y, 50)
Out[10]: 1.189209065991793323746605892665684223175048828125 + 37.7425653250409567363021778874099254608154296875*I
In [11]: (1.18920906599179510e+00 - sympy.re(sympy.N(x*y, 50)))/math.ulp(1.18920906599179510e+00)
Out[11]: 8.0000000000000000000000000000000000000000000000000

Ping @r-devulap is my math right? Something else going on?

@danra
Copy link

danra commented Jun 18, 2024

NumPy only guarantees 4 ULP precision for floating point results

Interesting, mind providing the reference?

@mhvk
Copy link
Contributor

mhvk commented Jun 18, 2024

For multiplication, I would think we do better than 4 ULP, but really the relevant check one needs to do is,

# 1ULP for the larger number that is involved
2**-52 * 25
# 5.551115123125783e-15
xy.real -xxyy.real
# np.float64(-1.7763568394002505e-15)

This is relevant since in the end one does x.real*y.real - x.imag*y.imag and both y.real and y.imag are fairly big.

Note that I had expected to find some way by reordering operations to reproduce the difference, but so far I failed; it does seem to have something to do with the different implementation for array and scalars.

@mhvk
Copy link
Contributor

mhvk commented Jun 18, 2024

For reference, the implementation certainly looks bog-standard:

NPY_NO_EXPORT void
@TYPE@_multiply(char **args, npy_intp const *dimensions, npy_intp const *steps, void *NPY_UNUSED(func))
{
BINARY_LOOP {
const @ftype@ in1r = ((@ftype@ *)ip1)[0];
const @ftype@ in1i = ((@ftype@ *)ip1)[1];
const @ftype@ in2r = ((@ftype@ *)ip2)[0];
const @ftype@ in2i = ((@ftype@ *)ip2)[1];
((@ftype@ *)op1)[0] = in1r*in2r - in1i*in2i;
((@ftype@ *)op1)[1] = in1r*in2i + in1i*in2r;
}
}

But possibly I'm missing some vectorization or so.

Scalars are done at

static inline int
@name@_ctype_multiply( @type@ a, @type@ b, @type@ *out)
{
npy_csetreal@c@(out, npy_creal@c@(a) * npy_creal@c@(b) - npy_cimag@c@(a) * npy_cimag@c@(b));
npy_csetimag@c@(out, npy_creal@c@(a) * npy_cimag@c@(b) + npy_cimag@c@(a) * npy_creal@c@(b));
return 0;
}

@ngoldbaum
Copy link
Member

mind providing the reference?

I'm not sure there is one, maybe there should be somewhere. It came along with adopting the Intel SVML library for writing SIMD-optimized loops. If you search for issues containing the phrase "4 ulp" you'll see other developers talking about this over the years.

@WarrenWeckesser
Copy link
Member

WarrenWeckesser commented Jun 18, 2024

Note that neither of the displayed real parts of the computed values are the true expected real value. The best value is 1.1892090659917947:

In [74]: x
Out[74]: (-0.7173752049805099+0.6966870282122177j)

In [75]: y
Out[75]: (25.441646575927734-27.90408706665039j)

In [76]: from mpmath import mp

In [77]: mp.dps = 200

In [78]: xmp = mp.mpc(x)

In [79]: ymp = mp.mpc(y)

In [80]: float((xmp*ymp).real)
Out[80]: 1.1892090659917947

The observed value 1.1892090659917933 is the one that is furthest from the ideal, by 6 ULP:

In [82]: (t - 1.1892090659917933)/np.spacing(t)
Out[82]: 6.0

@Svalorzen
Copy link
Author

Svalorzen commented Jun 18, 2024

If it can help in any way, the 1.1892090659917933 result is what I get on my local machine with C++ using Eigen, so at least for that value there's another platform that kind of agrees with it.

As an addition, I've been told that xxxyyy = np.cdouble(x)*np.cdouble(y) might also go one way or the other. On my machine I get the 933 result; on a friend's he gets 951 (we both get 933 with the "pure" python product). Again, don't know if this information can help narrow anything down.

@seberg
Copy link
Member

seberg commented Jun 18, 2024

The 4 ULP SVML thing is partially a thing of the past unless I am confused: we did run into issues with sin/cos and eventually switched the highe rprecision version since it wasn't that much slower anyway.
(I hope we can enable fast-but less precise mode at some point to allow such optimizations more cleanly.)

In either case the true number of ULPs, depends on the system math libraries and the functions, etc. in either case, so some functions may well have less than 4 ULPs!
For complex multiplication we compose things and have a subtraction in there, so a higher ULP shouldn't be too surprising, I think.

FWIW, we do have SIMD implementations for complex multiplication in numpy/numpy/_core/src/umath/loops_arithm_fp.dispatch.c.src, but in either case the only difference I think will be use of multiply-add (or similar) fused instructions which should increase precision if anything.

So given that as Warren said the less precise version is the scalar one that has existed forever, I don't think there is anything to do here.

@seberg seberg added 33 - Question Question about NumPy usage or development and removed 00 - Bug labels Jun 18, 2024
@seberg
Copy link
Member

seberg commented Jun 18, 2024

I don't mind if we try to use the SIMD fused ops also for scalars, but we don't get to a point where results between Python and numpy or comparing different platforms will be identical that way.

@mhvk
Copy link
Contributor

mhvk commented Jun 19, 2024

Interesting that the SIMD one is a bit more precise. Agree that we cannot do much here - given the numbers involved, this is a 1 ULP difference.

@rgommers
Copy link
Member

The 4 ULP SVML thing is partially a thing of the past unless I am confused: we did run into issues with sin/cos and eventually switched the highe rprecision version since it wasn't that much slower anyway.

Agreed, this is correct. We decided that 4 ULP is not acceptable by default, and switched back. Also, 4 ULP was only ever the extreme of a distribution of errors, not a regularly expected difference.

Looks like we can close this?

@rgommers rgommers changed the title BUG: Complex multiplication returns different results if wrapped in np array Complex multiplication returns different results if wrapped in np array Jun 19, 2024
@ngoldbaum
Copy link
Member

ngoldbaum commented Jun 19, 2024

Thanks for clarifying the math and apologies for my misremembering our policy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
33 - Question Question about NumPy usage or development
Projects
None yet
Development

No branches or pull requests

7 participants