Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

np.isclose is slow #16160

Closed
Hoiy opened this issue May 5, 2020 · 13 comments
Closed

np.isclose is slow #16160

Hoiy opened this issue May 5, 2020 · 13 comments

Comments

@Hoiy
Copy link

Hoiy commented May 5, 2020

np.isclose is very slow compared with math.isclose

Reproducing code example:

import numpy as np
import timeit
print(timeit.timeit('np.isclose(0.5, 0, atol=1e-4)', 'import numpy as np', number=10000))
# 0.487607091999962
print(timeit.timeit('math.isclose(0.5, 0, abs_tol=1e-4)', 'import math', number=10000))
# 0.007888869999987946

Error message:

Numpy/Python version information:

1.17.4 3.7.5 (default, Dec 9 2019, 22:42:26)
[Clang 11.0.0 (clang-1100.0.33.12)]

@rossbar
Copy link
Contributor

rossbar commented May 5, 2020

NumPy functions are intended to be run on NumPy arrays - in this case, you are passing in scalar values which are converted to arrays under-the-hood by np.isclose. This (along with some other additional setup) incurs some overhead which is likely what makes np.isclose slower than math.isclose for scalar inputs.

Note however that np.isclose is very efficient when running over an array of values:

>>> rg = np.random.default_rng()
>>> a, b = rg.random(1000), rg.random(1000)
>>> %timeit np.isclose(0.5, 0, atol=1e-4)  # Original example
29.4 µs ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit np.isclose(a, b, atol=1e-4)     # Same operation, 1000 elements
29.8 µs ± 499 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

@Hoiy
Copy link
Author

Hoiy commented May 6, 2020

@rossbar Thanks for pointing out that np.isclose is optimized for running on np.array, that make senses.

However, for simple use case with atol only, I find it problematic that it is still much slower than a naive implementation with np.abs:

>>> rg = np.random.default_rng()
>>> a, b = rg.random(1000), rg.random(1000)
>>> %timeit np.isclose(a,b, atol=1e-4)
38.7 µs ± 2.98 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit np.abs(a-b) <= 1e-4
3.36 µs ± 52.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

@rossbar
Copy link
Contributor

rossbar commented May 6, 2020

Good point - if you take a look at how np.isclose is implemented (e.g. np.isclose?? from an IPython terminal), there are several calls to isfinite and all to handle the case when an array contains non-finite values.

It may be worthwhile to see if things couldn't be reorganized so that those checks are moved further down in a conditional branch to try to reduce the overhead for the simple case.

@Hoiy
Copy link
Author

Hoiy commented May 9, 2020

After doing some basic profiling, it is found that np.all is quite slow, consuming about 25% of the total runtime, which comes from this line.

rg = np.random.default_rng()
a, b = rg.random(1000), rg.random(1000)
%%prun -s cumulative
for _ in range(100000):
    np.isclose(a, b, atol=1e-4)
...
   100000    1.761    0.000   12.429    0.000 numeric.py:2179(isclose)
   100000    2.931    0.000    6.228    0.000 numeric.py:2256(within_tol)
   **200000    0.236    0.000    3.418    0.000 <__array_function__ internals>:2(all)**
   200000    0.376    0.000    2.893    0.000 fromnumeric.py:2277(all)
   200000    0.764    0.000    2.517    0.000 fromnumeric.py:73(_wrapreduction)
   200000    0.815    0.000    2.039    0.000 _ufunc_config.py:39(seterr)
   200000    1.488    0.000    1.488    0.000 {method 'reduce' of 'numpy.ufunc' objects}
...

We could probably replace the two calls np.all(x) and np.all(y) with np.all(x & y) to reduce total run time by about 12%

fa, fb = np.isfinite(a), np.isfinite(b)
%timeit np.all(fa) and np.all(fb)
# 18.3 µs ± 6.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.all(fa & fb)
# 9.68 µs ± 498 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

@Hoiy
Copy link
Author

Hoiy commented May 9, 2020

Another slowness probably comes from this line with np.errstate(invalid='ignore') call. I don't have much idea how to speed up this.

%%timeit 
with np.errstate(invalid='ignore'):
    np.less_equal(np.abs(a-b), 1e-4 + 1e-05 * np.abs(b))
# 33.9 µs ± 4.54 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.less_equal(np.abs(a-b), 1e-4 + 1e-05 * np.abs(b))
15.6 µs ± 4.03 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

@WarrenWeckesser
Copy link
Member

Another slowness probably comes from this line with np.errstate(invalid='ignore') call.

The function within_tol defined inside isclose is only called with values of x and y that are known to be finite (i.e. not inf and not nan). The expression evaluated in that errstate context is less_equal(abs(x-y), atol + rtol * abs(y)). Are there finite values of x and y that could cause that expression to trigger an invalid warning or error? Hmmm, I guess the warning could be triggered with atol=np.inf and rtol=-np.inf, but that's a bit extreme. :)

FWIW, I just ran the full test suite without the errstate context wrapper, and all the tests passed.

@eric-wieser
Copy link
Member

which comes from this line.

Are you sure that's np.all and not builtins.all? I would expect the latter to be a significant performance hit, if we're using it by accident.

@Hoiy
Copy link
Author

Hoiy commented May 10, 2020

which comes from this line.

Are you sure that's np.all and not builtins.all? I would expect the latter to be a significant performance hit, if we're using it by accident.

Yes, I have stepped into this line. And eventually, it is calling np.logical_and.reduce() under the hook.

@Hoiy
Copy link
Author

Hoiy commented May 10, 2020

Another slowness probably comes from this line with np.errstate(invalid='ignore') call.

The function within_tol defined inside isclose is only called with values of x and y that are known to be finite (i.e. not inf and not nan). The expression evaluated in that errstate context is less_equal(abs(x-y), atol + rtol * abs(y)). Are there finite values of x and y that could cause that expression to trigger an invalid warning or error? Hmmm, I guess the warning could be triggered with atol=np.inf and rtol=-np.inf, but that's a bit extreme. :)

FWIW, I just ran the full test suite without the errstate context wrapper, and all the tests passed.

That's a great finding. It seems that we could speed this up by only adding errstate context wrapper if atol / rtol is not finite / nan (At least for the x, y, are all finite path). This should make the common cases run much faster.

@eric-wieser
Copy link
Member

Apologies, I missed the import * used at the bottom (!) of this file.

@seberg
Copy link
Member

seberg commented May 10, 2020

Hehe, one of the least used and stranger features... But, its not exposed for reductions:

np.logical_and(arr1, arr2, extobj=[8192, 0, None])

It feels a bit silly that it includes the buffersize, although you could use np.geterrobj()[0], which probably incurs mainly existing overheads. (at this time, what will make that above line slower, is probably mainly the fact that extobj is passed by kwargs, and we do not use vectorcall.

@Hoiy I think we will accept PRs for some of the things you found, but there is probably only so much to be done, considering that this calls many functions all of which incur a significant overhead since they are written to be fast for arrays and not scalars. (and in general any function call has a fairly significant overhead).

@seberg
Copy link
Member

seberg commented Apr 10, 2024

Since my last comment was a long time ago and mentioned errstate, errstate is much faster in NumPy 2.0 (although it may be possible to optimize it even more now).

@Hoiy
Copy link
Author

Hoiy commented Jun 22, 2024

@rossbar Thanks for pointing out that np.isclose is optimized for running on np.array, that make senses.

However, for simple use case with atol only, I find it problematic that it is still much slower than a naive implementation with np.abs:

>>> rg = np.random.default_rng()
>>> a, b = rg.random(1000), rg.random(1000)
>>> %timeit np.isclose(a,b, atol=1e-4)
38.7 µs ± 2.98 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit np.abs(a-b) <= 1e-4
3.36 µs ± 52.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Retesting with NumPy 2.0 after a long time.

import numpy as np
rg = np.random.default_rng()
a, b = rg.random(1000), rg.random(1000)

%timeit np.isclose(a,b)
# 20.7 μs ± 1.44 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit np.abs(a-b) <= 1.e-8
# 4.07 μs ± 338 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

%timeit np.abs(a-b) <= 1.e-8 + 1.e-5 * np.abs(b)
# 7.87 μs ± 175 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

The time difference between np.isclose() and np.abs() improved from 38.7 : 3.36 to 20.7 : 4.07 (about 5.08 times slower) . I realized that I should be doing apple to apple comparison involving rtol, so the difference is 20.7 : 7.87 (about 2.63 times slower)

The slowness of np.isclose() now comes from errstate context wrapper and this line

x, y, atol, rtol = (
        a if isinstance(a, (int, float, complex)) else asanyarray(a)
        for a in (a, b, atol, rtol))

Removing these would improve np.isclose() to about 12.4 μs (I know these lines exist for a reason, just stating the observation).

@Hoiy Hoiy closed this as completed Jun 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants