Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To digit simplification #82094

Merged
merged 4 commits into from
Feb 17, 2021
Merged

To digit simplification #82094

merged 4 commits into from
Feb 17, 2021

Conversation

gilescope
Copy link
Contributor

@gilescope gilescope commented Feb 14, 2021

I found out the other day that all the ascii digits have the first four bits as one would hope them to. (Eg. char 2 ends 0b0010). There are two bits to indicate it's in the digit range ( 0b0011_0000). If it is a true digit then all the higher bits aside from these two will be 0 (as ascii is the lowest part of the unicode u32 spectrum). So XORing with 0b11_0000 should mean we either get the number 0-9 or alternativly we get a larger number in the u32 space. If we get something that's not 0-9 then it will be discarded as it will be greater than the radix.

The code seems so fast though that there's quite a lot of noise in the benchmarks so it's not that easy to prove conclusively that it's faster as well as less instructions.

The non-fast path I was toying with as well wondering if we could do this as then we'd only have one return and less instructions still:

           match self {
                'a'..='z' => self as u32 - 'a' as u32 + 10,
                'A'..='Z' => self as u32 - 'A' as u32 + 10,
                _ => { radix = 10; self as u32 ^ ASCII_DIGIT_MASK},
            }

Here's the godbolt.

( H/T to @byteshadow for pointing out xor was what I needed)

@rust-highfive
Copy link
Collaborator

r? @dtolnay

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 14, 2021
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@m-ou-se m-ou-se assigned m-ou-se and unassigned dtolnay Feb 14, 2021
@m-ou-se m-ou-se added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Feb 14, 2021
Co-authored-by: Mara <m-ou.se@m-ou.se>
@rust-log-analyzer

This comment has been minimized.

Remove unused const
@m-ou-se
Copy link
Member

m-ou-se commented Feb 15, 2021

@bors r+

@bors
Copy link
Contributor

bors commented Feb 15, 2021

📌 Commit d2ba68b has been approved by m-ou-se

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 15, 2021
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Feb 16, 2021
To digit simplification

I found out the other day that all the ascii digits have the first four bits as one would hope them to. (Eg. char `2` ends `0b0010`). There are two bits to indicate it's in the digit range ( `0b0011_0000`). If it is a true digit then all the higher bits aside from these two will be 0 (as ascii is the lowest part of the unicode u32 spectrum). So XORing with `0b11_0000` should mean we either get the number 0-9 or alternativly we get a larger number in the u32 space. If we get something that's not 0-9 then it will be discarded as it will be greater than the radix.

The code seems so fast though that there's quite a lot of noise in the benchmarks so it's not that easy to prove conclusively that it's faster as well as less instructions.

The non-fast path I was toying with as well wondering if we could do this as then we'd only have one return and less instructions still:
```
           match self {
                'a'..='z' => self as u32 - 'a' as u32 + 10,
                'A'..='Z' => self as u32 - 'A' as u32 + 10,
                _ => { radix = 10; self as u32 ^ ASCII_DIGIT_MASK},
            }
```

Here's the [godbolt](https://godbolt.org/z/883c9n).

( H/T to `@byteshadow` for pointing out xor was what I needed)
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 17, 2021
…laumeGomez

Rollup of 11 pull requests

Successful merges:

 - rust-lang#79981 (Add 'consider using' message to overflowing_literals)
 - rust-lang#82094 (To digit simplification)
 - rust-lang#82105 (Don't fail to remove files if they are missing)
 - rust-lang#82136 (Fix ICE: Use delay_span_bug for mismatched subst/hir arg)
 - rust-lang#82169 (Document that `assert!` format arguments are evaluated lazily)
 - rust-lang#82174 (Replace File::create and write_all with fs::write)
 - rust-lang#82196 (Add caveat to Path::display() about lossiness)
 - rust-lang#82198 (Use internal iteration in Iterator::is_sorted_by)
 - rust-lang#82204 (Update books)
 - rust-lang#82207 (rustdoc: treat edition 2021 as unstable)
 - rust-lang#82231 (Add long explanation for E0543)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 253631d into rust-lang:master Feb 17, 2021
@rustbot rustbot added this to the 1.52.0 milestone Feb 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants