Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: np.clip as currently implemented (numpy 2.0.0) is quite hard to use #26759

Open
seberg opened this issue Jun 19, 2024 · 3 comments
Open
Milestone

Comments

@seberg
Copy link
Member

seberg commented Jun 19, 2024

I just want to add a voice to this thread that np.clip as currently implemented (numpy 2.0.0) is quite hard to use:

>>> np.clip(np.array([0, 255], dtype=np.uint8), 0, 4550)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jni/micromamba/envs/np2/lib/python3.12/site-packages/numpy/_core/fromnumeric.py", line 2247, in clip
    return _wrapfunc(a, 'clip', a_min, a_max, out=out, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jni/micromamba/envs/np2/lib/python3.12/site-packages/numpy/_core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/jni/micromamba/envs/np2/lib/python3.12/site-packages/numpy/_core/_methods.py", line 108, in _clip
    return um.clip(a, min, max, out=out, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OverflowError: Python integer 4550 out of bounds for uint8

I think most users in this scenario would expect to be able to use arrays of any type and Python ints and just have it work — there is nothing unsatisfiable about this expression, including keeping the input dtype.

When you start dealing with numpy scalars and so on, the story is indeed more complicated as noted in the above discussion, but the Python int scenario seems like an easy fix (as noted by @mhvk above) that would unlock a lot of uses already.

Sorry about the noise, I expect there will be lots in this repo this coming week. 😅 Thank you all! 🙏

Originally posted by @jni in #24976 (comment)

@seberg seberg added this to the 2.0.1 release milestone Jun 19, 2024
@seberg
Copy link
Member Author

seberg commented Jun 19, 2024

Branching off from the other issue. In principle, if clip was a ufunc, doing such an implementation would be a bit tedious. But from the not ridiculously purist side of things, it is actually very easy

clip is implemented in Python (dispatching to minimum, maximum, or _clip). So when the dtype is a NumPy integral, and the value is Python integer, we can check for np.iinfo(dtype).max > max and the same for minimum and just set it to not passed.

That is so easy, we could possibly even consider it a bug fix, but not sure it should be.

@seberg seberg modified the milestones: 2.0.1 release, 2.1.0 release Jun 19, 2024
@mhvk
Copy link
Contributor

mhvk commented Jun 20, 2024

I agree we could start by just doing this on the python side -- the C side would be good mostly as a way to generalize what we already do for comparisons, thus serving as an example on how to do things like this more generally. But really, no reason not to just do the simple thing...

@asmeurer
Copy link
Member

This is somewhat tied to the proposed change to remove type promotion #24976 (which is also necessary for array API compatibility, FYI), since not promoting min or max implies a downcast. The central issue is that downcasting via the usual "wraparound" logic leads to unexpected behavior. A more reasonable behavior for clip specifically would be to downcast by clipping. That would be surprising in most other contexts but for clip itself it should lead to expected behavior. Specifically, x = clip(a, min, max) should satisfy min <= x <= max, where the inequalities are based on pure numerical value (no wraparound from downcasting).

Python scalars erroring is basically the same issue, since ideally you'd want np.clip(np.array([0, 255], dtype=np.uint8), 0, 4550) to give the same result at np.clip(np.array([0, 255], dtype=np.uint8), np.array(0), np.array(4550)).

I think @seberg mentioned that NumPy already has similar logic somewhere for comparing int64 and uint64. For example, this gives the right answer, where a naive cast would not:

>>> np.asarray([-100], dtype=np.int64) < np.asarray([18446744073709551516 - 1], dtype=np.uint64)
array([ True])
>>> np.asarray([-100], dtype=np.int64).astype(np.uint64) < np.asarray([18446744073709551516 - 1], dtype=np.uint64)
array([False])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants