np.isclose is slow #16160

Hoiy · 2020-05-05T16:02:38Z

np.isclose is very slow compared with math.isclose

Reproducing code example:

import numpy as np
import timeit
print(timeit.timeit('np.isclose(0.5, 0, atol=1e-4)', 'import numpy as np', number=10000))
# 0.487607091999962
print(timeit.timeit('math.isclose(0.5, 0, abs_tol=1e-4)', 'import math', number=10000))
# 0.007888869999987946

Error message:

Numpy/Python version information:

1.17.4 3.7.5 (default, Dec 9 2019, 22:42:26)
[Clang 11.0.0 (clang-1100.0.33.12)]

rossbar · 2020-05-05T18:18:44Z

NumPy functions are intended to be run on NumPy arrays - in this case, you are passing in scalar values which are converted to arrays under-the-hood by np.isclose. This (along with some other additional setup) incurs some overhead which is likely what makes np.isclose slower than math.isclose for scalar inputs.

Note however that np.isclose is very efficient when running over an array of values:

>>> rg = np.random.default_rng()
>>> a, b = rg.random(1000), rg.random(1000)
>>> %timeit np.isclose(0.5, 0, atol=1e-4)  # Original example
29.4 µs ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit np.isclose(a, b, atol=1e-4)     # Same operation, 1000 elements
29.8 µs ± 499 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Hoiy · 2020-05-06T02:15:41Z

@rossbar Thanks for pointing out that np.isclose is optimized for running on np.array, that make senses.

However, for simple use case with atol only, I find it problematic that it is still much slower than a naive implementation with np.abs:

>>> rg = np.random.default_rng()
>>> a, b = rg.random(1000), rg.random(1000)
>>> %timeit np.isclose(a,b, atol=1e-4)
38.7 µs ± 2.98 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit np.abs(a-b) <= 1e-4
3.36 µs ± 52.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

rossbar · 2020-05-06T04:40:15Z

Good point - if you take a look at how np.isclose is implemented (e.g. np.isclose?? from an IPython terminal), there are several calls to isfinite and all to handle the case when an array contains non-finite values.

It may be worthwhile to see if things couldn't be reorganized so that those checks are moved further down in a conditional branch to try to reduce the overhead for the simple case.

Hoiy · 2020-05-09T14:42:38Z

After doing some basic profiling, it is found that np.all is quite slow, consuming about 25% of the total runtime, which comes from this line.

rg = np.random.default_rng()
a, b = rg.random(1000), rg.random(1000)

%%prun -s cumulative
for _ in range(100000):
    np.isclose(a, b, atol=1e-4)

...
   100000    1.761    0.000   12.429    0.000 numeric.py:2179(isclose)
   100000    2.931    0.000    6.228    0.000 numeric.py:2256(within_tol)
   **200000    0.236    0.000    3.418    0.000 <__array_function__ internals>:2(all)**
   200000    0.376    0.000    2.893    0.000 fromnumeric.py:2277(all)
   200000    0.764    0.000    2.517    0.000 fromnumeric.py:73(_wrapreduction)
   200000    0.815    0.000    2.039    0.000 _ufunc_config.py:39(seterr)
   200000    1.488    0.000    1.488    0.000 {method 'reduce' of 'numpy.ufunc' objects}
...

We could probably replace the two calls np.all(x) and np.all(y) with np.all(x & y) to reduce total run time by about 12%

fa, fb = np.isfinite(a), np.isfinite(b)

%timeit np.all(fa) and np.all(fb)
# 18.3 µs ± 6.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit np.all(fa & fb)
# 9.68 µs ± 498 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Hoiy · 2020-05-09T14:47:55Z

Another slowness probably comes from this line with np.errstate(invalid='ignore') call. I don't have much idea how to speed up this.

%%timeit 
with np.errstate(invalid='ignore'):
    np.less_equal(np.abs(a-b), 1e-4 + 1e-05 * np.abs(b))
# 33.9 µs ± 4.54 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit np.less_equal(np.abs(a-b), 1e-4 + 1e-05 * np.abs(b))
15.6 µs ± 4.03 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

WarrenWeckesser · 2020-05-09T16:02:32Z

Another slowness probably comes from this line with np.errstate(invalid='ignore') call.

The function within_tol defined inside isclose is only called with values of x and y that are known to be finite (i.e. not inf and not nan). The expression evaluated in that errstate context is less_equal(abs(x-y), atol + rtol * abs(y)). Are there finite values of x and y that could cause that expression to trigger an invalid warning or error? Hmmm, I guess the warning could be triggered with atol=np.inf and rtol=-np.inf, but that's a bit extreme. :)

FWIW, I just ran the full test suite without the errstate context wrapper, and all the tests passed.

eric-wieser · 2020-05-09T16:06:30Z

which comes from this line.

Are you sure that's np.all and not builtins.all? I would expect the latter to be a significant performance hit, if we're using it by accident.

Hoiy · 2020-05-10T12:57:43Z

which comes from this line.

Are you sure that's np.all and not builtins.all? I would expect the latter to be a significant performance hit, if we're using it by accident.

Yes, I have stepped into this line. And eventually, it is calling np.logical_and.reduce() under the hook.

Hoiy · 2020-05-10T13:01:38Z

Another slowness probably comes from this line with np.errstate(invalid='ignore') call.

The function within_tol defined inside isclose is only called with values of x and y that are known to be finite (i.e. not inf and not nan). The expression evaluated in that errstate context is less_equal(abs(x-y), atol + rtol * abs(y)). Are there finite values of x and y that could cause that expression to trigger an invalid warning or error? Hmmm, I guess the warning could be triggered with atol=np.inf and rtol=-np.inf, but that's a bit extreme. :)

FWIW, I just ran the full test suite without the errstate context wrapper, and all the tests passed.

That's a great finding. It seems that we could speed this up by only adding errstate context wrapper if atol / rtol is not finite / nan (At least for the x, y, are all finite path). This should make the common cases run much faster.

eric-wieser · 2020-05-10T13:01:56Z

Apologies, I missed the import * used at the bottom (!) of this file.

seberg · 2020-05-10T14:08:42Z

Hehe, one of the least used and stranger features... But, its not exposed for reductions:

np.logical_and(arr1, arr2, extobj=[8192, 0, None])

It feels a bit silly that it includes the buffersize, although you could use np.geterrobj()[0], which probably incurs mainly existing overheads. (at this time, what will make that above line slower, is probably mainly the fact that extobj is passed by kwargs, and we do not use vectorcall.

@Hoiy I think we will accept PRs for some of the things you found, but there is probably only so much to be done, considering that this calls many functions all of which incur a significant overhead since they are written to be fast for arrays and not scalars. (and in general any function call has a fairly significant overhead).

seberg · 2024-04-10T08:51:50Z

Since my last comment was a long time ago and mentioned errstate, errstate is much faster in NumPy 2.0 (although it may be possible to optimize it even more now).

Hoiy · 2024-06-22T06:14:01Z

@rossbar Thanks for pointing out that np.isclose is optimized for running on np.array, that make senses.

However, for simple use case with atol only, I find it problematic that it is still much slower than a naive implementation with np.abs:
>>> rg = np.random.default_rng()
>>> a, b = rg.random(1000), rg.random(1000)
>>> %timeit np.isclose(a,b, atol=1e-4)
38.7 µs ± 2.98 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit np.abs(a-b) <= 1e-4
3.36 µs ± 52.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Retesting with NumPy 2.0 after a long time.

import numpy as np
rg = np.random.default_rng()
a, b = rg.random(1000), rg.random(1000)

%timeit np.isclose(a,b)
# 20.7 μs ± 1.44 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit np.abs(a-b) <= 1.e-8
# 4.07 μs ± 338 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

%timeit np.abs(a-b) <= 1.e-8 + 1.e-5 * np.abs(b)
# 7.87 μs ± 175 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

The time difference between np.isclose() and np.abs() improved from 38.7 : 3.36 to 20.7 : 4.07 (about 5.08 times slower) . I realized that I should be doing apple to apple comparison involving rtol, so the difference is 20.7 : 7.87 (about 2.63 times slower)

The slowness of np.isclose() now comes from errstate context wrapper and this line

x, y, atol, rtol = (
        a if isinstance(a, (int, float, complex)) else asanyarray(a)
        for a in (a, b, atol, rtol))

Removing these would improve np.isclose() to about 12.4 μs (I know these lines exist for a reason, just stating the observation).

rossbar added 01 - Enhancement NumPy Sprint and removed NumPy Sprint labels Jul 11, 2020

mtreinish mentioned this issue Feb 27, 2021

Tune performance of OneQubitEulerDecomposer Qiskit/qiskit#5915

Merged

xbe mentioned this issue Mar 23, 2021

Speed up rotations by caching Pauli matrices qcc4cp/qcc#16

Merged

s-kuberski mentioned this issue Nov 17, 2021

Feature/irregular mc fjosw/pyerrors#10

Merged

beingkartik mentioned this issue Apr 10, 2024

ENH, BENCH: isclose runs faster by using errstate only when needed #26252

Closed

Hoiy closed this as completed Jun 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

np.isclose is slow #16160

np.isclose is slow #16160

Hoiy commented May 5, 2020

rossbar commented May 5, 2020

Hoiy commented May 6, 2020

rossbar commented May 6, 2020 •

edited

Loading

Hoiy commented May 9, 2020

Hoiy commented May 9, 2020

WarrenWeckesser commented May 9, 2020

eric-wieser commented May 9, 2020

Hoiy commented May 10, 2020

Hoiy commented May 10, 2020 •

edited

Loading

eric-wieser commented May 10, 2020

seberg commented May 10, 2020

seberg commented Apr 10, 2024

Hoiy commented Jun 22, 2024

np.isclose is slow #16160

np.isclose is slow #16160

Comments

Hoiy commented May 5, 2020

Reproducing code example:

Error message:

Numpy/Python version information:

rossbar commented May 5, 2020

Hoiy commented May 6, 2020

rossbar commented May 6, 2020 • edited Loading

Hoiy commented May 9, 2020

Hoiy commented May 9, 2020

WarrenWeckesser commented May 9, 2020

eric-wieser commented May 9, 2020

Hoiy commented May 10, 2020

Hoiy commented May 10, 2020 • edited Loading

eric-wieser commented May 10, 2020

seberg commented May 10, 2020

seberg commented Apr 10, 2024

Hoiy commented Jun 22, 2024

rossbar commented May 6, 2020 •

edited

Loading

Hoiy commented May 10, 2020 •

edited

Loading