Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up parsing for short ipv4s in std::net::Ipv4Addr::from_str #97118

Closed
wants to merge 3 commits into from

Conversation

nkconnor
Copy link

This aims to improve parsing speed for short and empty strings.

There is a proof of concept based on Criterion here: https://github.com/nkconnor/rust-ipv4-optimization/blob/main/benches/ipv4_fromstr_benchmark.rs

On my machine it offers about a 15X speed-up in the happy case (len() < 7). The parser today takes a more similar duration for "1.1.1" as it does for "1.1.1.1".

Optimizing for this case is especially helpful for situations where input is a non-null string type. For example, a CSV library may offer non-null string type for an IP Address column. IP Address is a very common element in "big data" and this optimization is likely to be appreciated in such contexts.

Criterion Output:

Main Parse OK time: [21.145 ns 21.181 ns 21.222 ns]
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe

Main Parse <7 time: [15.268 ns 15.286 ns 15.304 ns]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild

Patch Parse OK time: [21.309 ns 21.506 ns 21.843 ns]
Found 13 outliers among 100 measurements (13.00%)
4 (4.00%) high mild
9 (9.00%) high severe

Patch Parse <7 time: [998.09 ps 999.02 ps 1.0001 ns]

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label May 17, 2022
@rust-highfive
Copy link
Collaborator

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with r? rust-lang/libs-api @rustbot label +T-libs-api -T-libs to request review from a libs-api team reviewer. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @kennytm (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 17, 2022
@nkconnor
Copy link
Author

i'd also like to suggest we move both of these length checks into fn read_ipv4_addr(&mut self) -> Option<Ipv4Addr>. That will short circuit for the IpAddr::from_str case as well.

@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 20, 2022
@JohnCSimon
Copy link
Member

ping from triage:
@nkconnor - looks like this PR isn't ready for review

FYI: when a PR is ready for review, send a message containing
@rustbot ready to switch to S-waiting-on-review so the PR is in the reviewer's backlog.

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 24, 2022
@nkconnor
Copy link
Author

ping from triage:

@nkconnor - looks like this PR isn't ready for review

FYI: when a PR is ready for review, send a message containing

@rustbot ready to switch to S-waiting-on-review so the PR is in the reviewer's backlog.

@rustbot author

@JohnCSimon why isn't it ready for review?

@JohnCSimon
Copy link
Member

JohnCSimon commented Jul 24, 2022

From your comment it looked like you still had more to add to this PR - my mistake, then.
@rustbot review

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 24, 2022
@nkconnor
Copy link
Author

makes sense, I was hoping to get feedback on the comment prior to moving the checks. I didn't see a reason for this check to be in the place that it is.

@nkconnor
Copy link
Author

nkconnor commented Aug 9, 2022

r? libs

@Mark-Simulacrum
Copy link
Member

Can you say more about "the happy case" here? It looks like too short IPs previously went through a longer attempt to parse, but my (maybe wrong) assumption is that such IPs are fairly rare in any data set of IPs.

Are we expecting users to attempt parsing and then fallback on some alternative?

i'd also like to suggest we move both of these length checks into fn read_ipv4_addr(&mut self) -> Option. That will short circuit for the IpAddr::from_str case as well.

Presuming this is the same behavior-wise (probably the case, but would need to check call sites), that seems reasonable to me.

@nkconnor
Copy link
Author

nkconnor commented Aug 9, 2022

Can you say more about "the happy case" here? It looks like too short IPs previously went through a longer attempt to parse, but my (maybe wrong) assumption is that such IPs are fairly rare in any data set of IPs.

That's right, the happy case is where a too short IP is parsed and the parser exits early.

I think IPs with 0 < length < 7 are probably more rare than IPs with length == 0. I was motivated to look at this because we were processing a very large dataset and the question came up whether to do a length check before attempting parsing. Since the parser was slow for the too short IPs, we added the length check. Our dataset had cases of length==0 and 0 < length < 7.

If we kept the assumption that such IPs are fairly rare, could we say that the patch is still worthwhile? The criterion output shows a small but supposably measurable performance decline for parsing valid IPs.

Are we expecting users to attempt parsing and then fallback on some alternative?

I would expect this to happen sometimes. In my case, we were provided with a third-party string which may/may not be a valid IP and then writing it in a structured format to a storage archive.

Somewhat related, I took a peek at how url parses a Url. They don't use std's IP parsers. It's a little convoluted but they do attempt parsing and then fallback to a domain name: https://docs.rs/url/latest/src/url/host.rs.html#77-119.

i'd also like to suggest we move both of these length checks into fn read_ipv4_addr(&mut self) -> Option. That will short circuit for the IpAddr::from_str case as well.

Presuming this is the same behavior-wise (probably the case, but would need to check call sites), that seems reasonable to me.

I believe it is. This read_ipv4_addr is also used within parsing for IPv6s and SocketAddrs.

@Mark-Simulacrum
Copy link
Member

OK, this PR seems fine then -- let's update it to adjust read_ipv4_addr then.

@rustbot
Copy link
Collaborator

rustbot commented Aug 9, 2022

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rust-log-analyzer

This comment has been minimized.

@nkconnor
Copy link
Author

nkconnor commented Aug 9, 2022

some tests appear to be failing, looking into it. Also, tested the case of an empty string and the stats are different than above (which tested "1.1.1" .. 0<length<7)

current       time:   [4.6269 ns 4.6319 ns 4.6373 ns]
patch         time:   [3.4966 ns 3.4984 ns 3.5004 ns]

@nkconnor
Copy link
Author

nkconnor commented Aug 9, 2022

I think tests fail with SocketAddrs because the Parser state includes the port info, and the max length check (<=15) isn't valid when looking at that too. I would have to propagate AddrKind down to the parsing function to determine the max length, or leave the max length check where it is.

@nkconnor
Copy link
Author

nkconnor commented Aug 9, 2022

this could/should fail too (ipv6s can be shorter, I think), but it's one example of how we could be consistent in doing length checks across the net::parser::AddrKinds. Not sure which direction to go.. I will come back to this unless you have suggestion.

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 9, 2022
@rust-log-analyzer
Copy link
Collaborator

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
configure: rust.debug-assertions := True
configure: rust.overflow-checks := True
configure: llvm.assertions      := True
configure: dist.missing-tools   := True
configure: build.configure-args := ['--enable-sccache', '--disable-manage-submodu ...
configure: writing `config.toml` in current directory
configure: 
configure: run `python /checkout/x.py --help`
Attempting with retry: make prepare
---
thread '<unnamed>' panicked at 'static string', library/std/src/thread/tests.rs:190:9
.....................................................
failures:

---- net::ip::tests::ip_properties stdout ----
thread 'net::ip::tests::ip_properties' panicked at 'called `Result::unwrap()` on an `Err` value: AddrParseError(Ip)', library/std/src/net/ip/tests.rs:315:5

failures:
    net::ip::tests::ip_properties


test result: FAILED. 931 passed; 1 failed; 1 ignored; 0 measured; 0 filtered out; finished in 10.32s

error: test failed, to rerun pass '-p std --lib'
Build completed unsuccessfully in 0:01:38

@bors
Copy link
Contributor

bors commented Aug 29, 2022

☔ The latest upstream changes (presumably #101143) made this pull request unmergeable. Please resolve the merge conflicts.

@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 2, 2022
@JohnCSimon
Copy link
Member

@nkconnor
Ping from triage: I'm closing this due to inactivity, Please reopen when you are ready to continue with this.
Note: if you do please open the PR BEFORE you push to it, else you won't be able to reopen - this is a quirk of github.
Thanks for your contribution.

@rustbot label: +S-inactive

@JohnCSimon JohnCSimon closed this Nov 27, 2022
@rustbot rustbot added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Nov 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants