Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Int64.parseInt doesn't check for overflow, nor do its docs explain what happens #39

Open
trinarytree opened this issue May 18, 2018 · 1 comment

Comments

@trinarytree
Copy link

Int64.parseInt('12343498230482340234820349823049823')

yields

-6445017949831143329

which isn't what I'd expect. I would expect this to throw.

Compare to Java:

Integer.parseInt('12343498230482340234820349823049823');

throws

java.lang.NumberFormatException: For input string: "12432904823048234092384023482309"

The class level docs for Int64 vaguely say,
"Arithmetic operations may overflow in order to maintain this range."
but I don't see how that should include parsing.
Overflow in ordinary arithmetic can be used for beneficial purposes, such as implementing an Int128 out of Int64, but what possible benefit is there to allowing overflow in parsing?

Some options:
(a) Lazy: Update the docs to at least say what happens since (imo) this isn't expected.
(b) More ambitious: Add an optional named param to parseInt like throwOnOverflow (default false).
It shouldn't be too expensive to detect; after all, Java does it. Maybe just test the carry at each step.
(c) Very ambitious: Forget about the optional param, just throw on overflow since it's hard to imagine a legit use case for allowing overflow. This is ambitious in the sense that it changes existing semantics.

@trinarytree
Copy link
Author

I believe this snippet added to _parseRadix would cheaply detect overflow:

if ((d2 & ~_MASK2 != 0) || (!negative && (d2 & _SIGN_BIT_MASK != 0))) {
  throw new FormatException('Overflow: $s');
}

Without such a thing, clients cannot as cheaply detect overflow. I only see 2 basic ways to do so:
(1) Canonicalize the input string to strip leading 0s and convert -0 to 0 and then
compare that canonicalized string to the toString'ed parsed Int64. e.g.

var i = Int64.parseInt(s);
if (i.toString() != canonicalizeIntString(s)) {
  throw FormatException('Overflow: $s');
}

This is a bit brittle because the canonicalizer could easily miss a few cases.
E.g. does +12 count as 12? What if Int64 adds new formats that it accepts, like accepting commas
or octal? This is also not very efficient since i.toString() involves many divisions et al complex operations.

(2) Reimplement most of _parseRadix, but add that snippet of code I proposed above.
This can't actually produce an Int64 because that would require access to private members, so you'll have to parse the string twice:

throwOnInt64ParsingOverflow(s);
return Int64.parseInt(s);

One could make this a little more efficient by converting the 3 vars d0, d1, d2 into bytes (or 32-bit ints) and calling fromBytes (or fromInts). But this is tantamount to forking the implementation. Clients who did this wouldn't benefit from updates to Int64.parseInt.

Would anyone be opposed to my adding the proposed snippet either using option (b) or (c)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant