Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: type cast behaviour change in NumPy 2.0 #26761

Open
LevN0 opened this issue Jun 19, 2024 · 6 comments
Open

BUG: type cast behaviour change in NumPy 2.0 #26761

LevN0 opened this issue Jun 19, 2024 · 6 comments
Labels

Comments

@LevN0
Copy link

LevN0 commented Jun 19, 2024

Describe the issue:

In NumPy < 2.0,

>>> arr = np.asarray([-10, -5], 'i1')
>>> arr += 128
>>> arr
array([118, 123], dtype=int8)

In NumPy 2.0,

>>> arr = np.asarray([-10, -5], 'i1')
>>> arr += 128
OverflowError: Python integer 128 out of bounds for int8

This may have been an intended change as part of https://numpy.org/devdocs/release/1.24.0-notes.html#conversion-of-out-of-bound-python-integers, but it's not clearly stated there or in #26635.

The release notes in 1.24.0 say "Attempting a conversion from a Python integer to a NumPy value will now always check whether the result can be represented by NumPy". By that wording, the above operation should still work as the result is still an int8, although it is perhaps ambiguous in that it could be implied there is a conversion of 128 to a NumPy value prior to the operation. The release notes for 2.0 don't directly touch on this case, only talking about "conversion of out of bounds python integers to integer arrays".

The more significant issue IMO is that the new behavior cannot be trivially worked around as far as I can tell, see discussion below.

Reproduce the code example:

>>> arr = np.asarray([-10, -5], 'i1')
>>> arr += 128
OverflowError: Python integer 128 out of bounds for int8

Error message:

OverflowError: Python integer 128 out of bounds for int8

Python and NumPy Versions:

2.0.0
3.11.4 (tags/v3.11.4:d2340ef, Jun 7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)]

Runtime Environment:

No response

Context for the issue:

My use-case was applying scaling and offset values to a numeric starting value while maintaining the minimum possible dtype. I had a separate code to check that the result of the operation fits in the final dtype (otherwise the dtype was adjusted), so the operation was safe. Now NumPy refuses to do the operation directly.

Recovering the old behavior is not trivial from what I can see. Although the documentation suggests arr += np.asarray(128) could be used instead, this doesn't always quite work. For example,

>>> arr = np.asarray([5, 10], 'uint16')
>>> arr += np.asarray(30000) # default dtype for this is int64
numpy._core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'add' output from dtype('int64') to dtype('uint16') with casting rule 'same_kind'

Another approach that also doesn't completely work is using e.g. np.add(..., casting='unsafe'). It solves the problem just above, but it behaves differently with MaskedArrays than "+=" approach. Specifically it modifies masked values, whereas the "+=" approach did not.

The fastest way I see to recover the old behavior in the general case (i.e. where starting dtype can be anything) and where the operation is known apriori to produce a safe result is a multi-step process,

  1. Check if N in arr += np.asarray(N) is an integer value
  2. If so, determine and pass minimum necessary dtype for the intenger value, i.e. arr += np.asarray(N, dtype=min_dtype)
  3. Otherwise, use arr += np.asarray(N)

This is quite a bit more complicated than arr += N. I would have expected the new behavior to throw an exception when the final result of the operation overflows, but not when N is technically out of bounds of arr's dtype but the operation still produces a non-overflow result.

@LevN0 LevN0 added the 00 - Bug label Jun 19, 2024
@ngoldbaum
Copy link
Member

Although the documentation suggests arr += np.asarray(128) could be used instead, this doesn't always quite work. For example

It does work if you specify the dtype though:

>>> arr = np.asarray([5, 10], 'uint16')
>>> arr += np.asarray(30000, dtype='uint16')
>>> arr
array([30005, 30010], dtype=uint16)

Maybe the docs should suggest that?

The net effect of the NEP 50 changes is that people will have to be a lot more explicit about exactly what dtype they want to use. Before things were cast to larger dtypes "behind the scenes", which in other cases was undesirable. We've decided to change things to force people to be more explicit now. Unfortunately that means all the places people were unintentionally using a cast behind the scenes are now being exposed.

@LevN0
Copy link
Author

LevN0 commented Jun 20, 2024

That workaround does work if you find the right dtype. It's not as simple as just giving the dtype of the array you're modifying though, because e.g.

>>> arr = np.asarray([-10, -5], 'i1')
>>> arr += 128

These are 2 different dtypes, hence the reason I guess it doesn't work anymore in the first place. So the workaround to find the right dtype seems to be what I had listed as step 2,

If so, determine and pass minimum necessary dtype for the intenger value, i.e. arr += np.asarray(N, dtype=min_dtype)

More broadly, in NEP 50, it mentions changes when the operation overflows, e.g. the comparison table has uint8(1) + 300 and uint8(100) + 200, but I did not see it mentioning in the table or elsewhere that non-overflow operations are also affected, e.g. np.int8(-1) + 128. However, I may have missed it?

@seberg
Copy link
Member

seberg commented Jun 20, 2024

You are already hardcoding the fact that you might as well do the whole operation in int8 due to well defined integer overflow behavior. So while a bit round-about because NumPy makes it somewhat hard now to do that explicit lossy cast:

arr += np.array(128).astype(arr.dtype)

Now, it might be nice if that cast could be achieved more easily, but it not being easy does solve the trap of doing an out-of-bound cast by accident.

Of course if you don't happen to care about oddities of masked arrays on whether or not he masked values are mutated, then the np.add call with casting and dtype=/signature= is simpler.

@LevN0
Copy link
Author

LevN0 commented Jun 21, 2024

That's a neat trick with the integer overflow. Does it work for any integer dtype and all common operations (adding, subtracting, multiplying or dividing) when assuming the operation with the integer inside np.array(N).astype(arr.dtype) would have created a valid value for arr.dtype had it been done without the use of integer overflow? Is it completely portable, would it still work on pypy?

One thing that definitely makes this harder is that this trick limits the offset value (i.e. 128 in arr += np.array(128).astype(arr.dtype)) to only be integer, whereas it could easily be a float. In my case I was already adjusting the array dtype prior to the operation to match the dtype that the result needs to be, but in a general situation this is another thing that used to work trivially and no longer does.

but it not being easy does solve the trap of doing an out-of-bound cast by accident.

NumPy could still protect you from that if it only raised the exception when an overflow actually occurs in this situation. E.g.,

>>> np.asarray([-10, -5], 'int8') + 128 
array([118, 123], dtype=int8)
>>> np.asarray([-10, -5], 'int8') + 300
OverflowError

I had thought from reading NEP 50 that this is what it does, as all examples for integers I noticed only talk about actual overflow situations. Maybe this was too complicated to implement or slow from a performance perspective?

@seberg
Copy link
Member

seberg commented Jun 21, 2024

On two complement computers (i.e. all in practice), yes, it should just works always. For subtraction/addition it doesn't matter if you cast first or later and cast you do anyway.

NumPy never looked at the value of the result (except for some scalar operations). Is it possible? Sure, but slower and neeeds extra paths, and I don't think anyone ever proposed it seriously.

You have a very specific use-case since you apparently start of with known negative numbers but want to add unsigned to it.
If that isn't the case, yeah, all examples that might overflow already overflow in the conversion step, which is what I think everything says, but dunno.

to only be integer, whereas it could easily be a float.

It can't be a float if you have an integer result, since NumPy always rejected that. So if it is a float, and it works, then the dtype is also float. Do you care to do the operation in higher precision and then downcast the float?
(Slow, but not sure if more precise in rare cases.)

but in a general situation this is another thing that used to work trivially and no longer does.

Are you sure, I am not sure which use-cases these would be? Previously whether or not you got the correct upcast was practically random, because it depended on the input value and not the output ones; maybe there was a misconception that the output was always right, in which case that I take that as an argument for the chance: now you are forced to think about it, but if you had not been careful, this might have been a buggy code path!

laqieer added a commit to laqieer/FEHRR that referenced this issue Jun 22, 2024
Fix: OverflowError: Python integer 128 out of bounds for int8
Reference: numpy/numpy#26761
@LevN0
Copy link
Author

LevN0 commented Jun 27, 2024

NumPy never looked at the value of the result (except for some scalar operations). Is it possible? Sure, but slower and neeeds extra paths, and I don't think anyone ever proposed it seriously.

This sort of knowledge isn't really directly available to the user. It seems what I am reporting here is intended behavior in NumPy 2.0 rather than a bug, but I recommend adding examples to NEP 50 or the migration guide that show this, as all the examples I can see at least for integers are concerned with cases where an actual overflow occurs.

You have a very specific use-case since you apparently start of with known negative numbers but want to add unsigned to it.

That is a specific case where it doesn't work, but my operation doesn't start out with known negative integers where I want to add unsigned integer to it. It starts out with "I have an arbitrary NumPy integer array and I want to add an integer value to it" (i.e., np.array([...]) + N), where the only a priori knowledge is that the operation's result does not overflow.

It can't be a float if you have an integer result, since NumPy always rejected that. So if it is a float, and it works, then the dtype is also float

I just meant that np.asarray([1, 2], 'int8') * 2.5 used to work. It is a good point that what precision this operation used in the general case is ambiguous without prior knowledge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants