Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add u8::parse_ascii_digit method #83447

Closed

Conversation

coolreader18
Copy link
Contributor

I'd appreciate some guidance from the libs team wrt naming, since I think there's a inconsistency in how u8 mirrors methods from char -- char and u8 currently have is_ascii_digit, but char also has is_digit(radix). So, if we were to add is_digit/to_digit to u8, I think those names make sense; however, at the moment, all of u8's char-like have an _ascii qualifier in their name. So, should we break that convention, or should we come up with a new one like {is,to}_ascii_digit_radix that no longer has the same name as the equivalent method on char?

@rust-highfive
Copy link
Collaborator

r? @kennytm

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 24, 2021
@rust-log-analyzer

This comment has been minimized.

@egilburg
Copy link

egilburg commented Mar 24, 2021

I don't know if the ship has already sailed with, as you mentioned, "mirroring" between u8 and char, but this may be an example of questioning how much they should mirror each other?

char can be seen as a special case of u8 that gives it semantic meaning, just like u8 is a special case of a byte that gives it semantic meaning. Asking a char whether it represents a digit makes semantic sense; '1' is a char representing a digit, but 'a' is not. But asking the same question of u8 either makes less sense or, worse, can (semantically correctly) give a different answer to what it's char equivalent would. Because in a u8, everything is a sequence of one or more digits (whether in decimal or any other base).This doesn't correspond to ascii char, where '1' would be 49 (u8) and 'a' would be 97 (u8). It makes little semantic sense to say that in the domain of unsigned numbers, 49 is a digit and 97 is not.

What you're really asking from the unsigned integer is, "if this integer was visually represented as a char, would it then look like a digit? This is double indirection and seems to overload the general type (u8) with semantics that are only relevant for some of its more specialized representations (char).

to_ascii_digit removes the ambiguity somewhat compared to just is_digit, but IMO the base concern still applies.

library/core/tests/ascii.rs Outdated Show resolved Hide resolved
@leonardo-m
Copy link

leonardo-m commented Mar 24, 2021

I'm not so fond of panics that could be avoided. So I'd like an API where the radix is a const generic argument (and generates panics at compile time). And/or an API where a wrong radix leads to a None result instead of a panic. But the first requires a new function (like slice::array_windows), and the second is probably not a possible change now?

@coolreader18
Copy link
Contributor Author

I'm not sure, I just copied the existing char::to_digit implementation + docs and edited them, so while I do agree that panics aren't great in situations like this I think it's fine to keep it consistent.

@leonardo-m
Copy link

I agree it's better to keep the two consistent. But the question is if we can also change the other function, so both return None instead of raising a panic :)

/// ```
#[unstable(feature = "to_ascii_digit", issue = "83447")]
#[inline]
pub fn to_ascii_digit_radix(&self, radix: u32) -> Option<u32> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the return value a u32?

Copy link
Contributor Author

@coolreader18 coolreader18 Mar 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was unsure about this as well; it seems weird that the char version also operates on u32 since the max that can be accepted/returned is 36/35, but I figured maybe there was a reason other than than the fact that char==u32 (like the '32-bit is a good default choice' that rust ascribes to or something)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we usually use u32 as a default for (unsigned) integers that don't get large. E.g. the exponent in pow or the return value of count_zeros.

@joshtriplett
Copy link
Member

I personally don't think this needs the _radix suffix.

(Bikeshedding acknowledged. The types involved seem like the more important question.)

@bors
Copy link
Contributor

bors commented Mar 30, 2021

☔ The latest upstream changes (presumably #83664) made this pull request unmergeable. Please resolve the merge conflicts.

@camelid camelid added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Apr 14, 2021
@m-ou-se
Copy link
Member

m-ou-se commented Apr 21, 2021

Just by the name i was expecting this function to do the exact opposite: 5u8.to_ascii_digit(10) == '5' ("convert the value 5 to an ascii digit"), but instead it does b'5'.to_ascii_digit(10) == 5 ("convert the ascii char '5' to a value"). For char::to_digit the name works because char isn't an integer, so only one meaning makes sense. But for something like u8 it gets confusing which direction this conversion is going.

@coolreader18
Copy link
Contributor Author

Hmm, yeah, that would be confusing as well. Maybe something with parse to communicate that it's interpreting it as text? parse_ascii_digit? I guess given the ambiguity of ascii char -> u8 digit vs u8 digit to ascii char, any name that tries to mirror char's methods is probably off the table.

@coolreader18
Copy link
Contributor Author

Renamed to parse_ascii_digit(). I think bikeshedding on the name probably would/should be the main topic of discussion for this feature, so I'm not sure if that should take place here or in a separate tracking issue?

@bstrie
Copy link
Contributor

bstrie commented May 12, 2021

I'm not sure if that should take place here or in a separate tracking issue?

@coolreader18 If you think this will take a while to bikeshed then you may want to close this PR for now and open a tracking issue, if only to spare yourself the effort of rebasing. :)

In the meantime I'm going to recategorize this from "waiting on review" to "waiting on bikeshed".

@bstrie bstrie added S-waiting-on-bikeshed Status: Awaiting a decision on trivial things. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 12, 2021
@coolreader18
Copy link
Contributor Author

That makes sense, thanks. I don't mind rebasing/it can be put off until there's consensus, so hopefully it's fine if I keep this open.

I feel pretty good about u8::parse_ascii_digit, does anyone have thoughts on that? It might be odd to "parse" something that's fundamentally a single atom; I feel like parse implies interpreting multiple tokens, though it probably doesn't have to.

@crlf0710 crlf0710 added S-waiting-on-bikeshed Status: Awaiting a decision on trivial things. and removed S-waiting-on-bikeshed Status: Awaiting a decision on trivial things. labels Jun 5, 2021
@crlf0710 crlf0710 added S-waiting-on-bikeshed Status: Awaiting a decision on trivial things. and removed S-waiting-on-bikeshed Status: Awaiting a decision on trivial things. labels Jun 26, 2021
@crlf0710 crlf0710 added S-waiting-on-bikeshed Status: Awaiting a decision on trivial things. and removed S-waiting-on-bikeshed Status: Awaiting a decision on trivial things. labels Jul 17, 2021
@camsteffen
Copy link
Contributor

camsteffen commented Aug 2, 2021

What if we mirrored parse/FromStr with parse_ascii_byte/FromAsciiByte?

Edit: Meh, no, that abstraction wouldn't be useful beyond digits. I think parse_ascii_digit is good.

@camsteffen
Copy link
Contributor

I would expect no radix arg since u8::is_ascii_digit does not accept a radix. There should be parse_ascii_hexdigit as well since we have u8::is_ascii_hexdigit. It should return a Result similar to from_str_radix (but no features on the error are necessary).

@yaahc
Copy link
Member

yaahc commented Aug 4, 2021

What is the difference between b'1'.to_digit(10) and (b'1' as char).to_digit(10)?

@m-ou-se
Copy link
Member

m-ou-se commented Aug 4, 2021

Same as the difference between b'1'.is_ascii_digit() and (b'1' as char).is_ascii_digit() I suppose: Just a short-hand.

@m-ou-se m-ou-se changed the title Add u8::to_digit method Add u8::parse_ascii_digit method Aug 4, 2021
@joshtriplett
Copy link
Member

joshtriplett commented Aug 4, 2021

We discussed this in today's @rust-lang/libs-api meeting.

Even with the improved name, we still felt quite hesitant to have this as a method on u8. The more we talked about it, the more we're inclined to instead have methods parallel to from_str_radix: from_char_radix taking a char and a radix (to be used as a less-confusing alternative to char::to_digit), and something like from_ascii_digit_radix taking a u8 and a radix.

(Note that some people in the meeting didn't want to have from_ascii_digit_radix at all, preferring to require a .into() or similar on the u8 and a call to from_char_radix. Others, including myself, would like to have a method taking a u8 to avoid the need to cast.)

@leonardo-m
Copy link

It's better to avoid "as" casts whenever possible. But this desire should be balanced with other desires, like to keep the stdlib smaller, etc :)

@bors
Copy link
Contributor

bors commented Feb 8, 2022

☔ The latest upstream changes (presumably #93762) made this pull request unmergeable. Please resolve the merge conflicts.

@bors bors added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Feb 8, 2022
@Dylan-DPC
Copy link
Member

Closing this based on the meeting notes. Thanks for contributing.

@Dylan-DPC Dylan-DPC closed this Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. S-waiting-on-bikeshed Status: Awaiting a decision on trivial things. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.