-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: type cast behaviour change in NumPy 2.0 #26761
Comments
It does work if you specify the dtype though:
Maybe the docs should suggest that? The net effect of the NEP 50 changes is that people will have to be a lot more explicit about exactly what dtype they want to use. Before things were cast to larger dtypes "behind the scenes", which in other cases was undesirable. We've decided to change things to force people to be more explicit now. Unfortunately that means all the places people were unintentionally using a cast behind the scenes are now being exposed. |
That workaround does work if you find the right dtype. It's not as simple as just giving the dtype of the array you're modifying though, because e.g. >>> arr = np.asarray([-10, -5], 'i1')
>>> arr += 128 These are 2 different dtypes, hence the reason I guess it doesn't work anymore in the first place. So the workaround to find the right dtype seems to be what I had listed as step 2,
More broadly, in NEP 50, it mentions changes when the operation overflows, e.g. the comparison table has |
You are already hardcoding the fact that you might as well do the whole operation in int8 due to well defined integer overflow behavior. So while a bit round-about because NumPy makes it somewhat hard now to do that explicit lossy cast:
Now, it might be nice if that cast could be achieved more easily, but it not being easy does solve the trap of doing an out-of-bound cast by accident. Of course if you don't happen to care about oddities of masked arrays on whether or not he masked values are mutated, then the |
That's a neat trick with the integer overflow. Does it work for any integer dtype and all common operations (adding, subtracting, multiplying or dividing) when assuming the operation with the integer inside One thing that definitely makes this harder is that this trick limits the offset value (i.e. 128 in
NumPy could still protect you from that if it only raised the exception when an overflow actually occurs in this situation. E.g., >>> np.asarray([-10, -5], 'int8') + 128
array([118, 123], dtype=int8)
>>> np.asarray([-10, -5], 'int8') + 300
OverflowError I had thought from reading NEP 50 that this is what it does, as all examples for integers I noticed only talk about actual overflow situations. Maybe this was too complicated to implement or slow from a performance perspective? |
On two complement computers (i.e. all in practice), yes, it should just works always. For subtraction/addition it doesn't matter if you cast first or later and cast you do anyway. NumPy never looked at the value of the result (except for some scalar operations). Is it possible? Sure, but slower and neeeds extra paths, and I don't think anyone ever proposed it seriously. You have a very specific use-case since you apparently start of with known negative numbers but want to add unsigned to it.
It can't be a float if you have an integer result, since NumPy always rejected that. So if it is a float, and it works, then the dtype is also float. Do you care to do the operation in higher precision and then downcast the float?
Are you sure, I am not sure which use-cases these would be? Previously whether or not you got the correct upcast was practically random, because it depended on the input value and not the output ones; maybe there was a misconception that the output was always right, in which case that I take that as an argument for the chance: now you are forced to think about it, but if you had not been careful, this might have been a buggy code path! |
Fix: OverflowError: Python integer 128 out of bounds for int8 Reference: numpy/numpy#26761
This sort of knowledge isn't really directly available to the user. It seems what I am reporting here is intended behavior in NumPy 2.0 rather than a bug, but I recommend adding examples to NEP 50 or the migration guide that show this, as all the examples I can see at least for integers are concerned with cases where an actual overflow occurs.
That is a specific case where it doesn't work, but my operation doesn't start out with known negative integers where I want to add unsigned integer to it. It starts out with "I have an arbitrary NumPy integer array and I want to add an integer value to it" (i.e.,
I just meant that |
Describe the issue:
In NumPy < 2.0,
In NumPy 2.0,
This may have been an intended change as part of https://numpy.org/devdocs/release/1.24.0-notes.html#conversion-of-out-of-bound-python-integers, but it's not clearly stated there or in #26635.
The release notes in 1.24.0 say "Attempting a conversion from a Python integer to a NumPy value will now always check whether the result can be represented by NumPy". By that wording, the above operation should still work as the result is still an int8, although it is perhaps ambiguous in that it could be implied there is a conversion of
128
to a NumPy value prior to the operation. The release notes for 2.0 don't directly touch on this case, only talking about "conversion of out of bounds python integers to integer arrays".The more significant issue IMO is that the new behavior cannot be trivially worked around as far as I can tell, see discussion below.
Reproduce the code example:
Error message:
OverflowError: Python integer 128 out of bounds for int8
Python and NumPy Versions:
2.0.0
3.11.4 (tags/v3.11.4:d2340ef, Jun 7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)]
Runtime Environment:
No response
Context for the issue:
My use-case was applying scaling and offset values to a numeric starting value while maintaining the minimum possible dtype. I had a separate code to check that the result of the operation fits in the final dtype (otherwise the dtype was adjusted), so the operation was safe. Now NumPy refuses to do the operation directly.
Recovering the old behavior is not trivial from what I can see. Although the documentation suggests
arr += np.asarray(128)
could be used instead, this doesn't always quite work. For example,Another approach that also doesn't completely work is using e.g.
np.add(..., casting='unsafe')
. It solves the problem just above, but it behaves differently with MaskedArrays than "+=" approach. Specifically it modifies masked values, whereas the "+=" approach did not.The fastest way I see to recover the old behavior in the general case (i.e. where starting dtype can be anything) and where the operation is known apriori to produce a safe result is a multi-step process,
N
inarr += np.asarray(N)
is an integer valuearr += np.asarray(N, dtype=min_dtype)
arr += np.asarray(N)
This is quite a bit more complicated than
arr += N
. I would have expected the new behavior to throw an exception when the final result of the operation overflows, but not whenN
is technically out of bounds ofarr
's dtype but the operation still produces a non-overflow result.The text was updated successfully, but these errors were encountered: