Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize HashCode.AddBytes for inputs larger than 16 bytes. #70095

Merged
merged 4 commits into from
Jun 29, 2022

Conversation

teo-tsirpanis
Copy link
Contributor

@teo-tsirpanis teo-tsirpanis commented Jun 1, 2022

The HashCode.AddBytes method is optimized to read four bytes at a time and feed them to the private HashCode.Add(int) method, but this method is not that much optimized in processing large amounts of input; because the xxHash algorithm works in batches of 16 bytes, this implementation has to queue the integers in separate fields until four of them are accumulated and then update the hash's state, and that whole logic has a couple of branches.

This PR avoids this queueing logic, instead reading input and directly updating the hash's state in batches of 16 bytes, if the input is large enough.

Benchmarks show significant improvements:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1706 (21H2)
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=7.0.100-preview.5.22267.11
  [Host]     : .NET 7.0.0 (7.0.22.26611), X64 RyuJIT
  DefaultJob : .NET 7.0.0 (7.0.22.26611), X64 RyuJIT
Method Size Mean Error StdDev Median Ratio RatioSD Rank
HashCode 1 7.030 ns 0.1780 ns 0.2496 ns 6.974 ns 1.00 0.00 2
HashCodeNeo 1 6.266 ns 0.1628 ns 0.3993 ns 6.133 ns 0.90 0.05 1
HashCodeNeoUnaligned 1 8.613 ns 0.2078 ns 0.3236 ns 8.581 ns 1.23 0.06 3
HashCode 4 6.838 ns 0.1740 ns 0.3891 ns 6.750 ns 1.00 0.00 2
HashCodeNeo 4 6.412 ns 0.1057 ns 0.0988 ns 6.374 ns 0.90 0.03 1
HashCodeNeoUnaligned 4 8.074 ns 0.1010 ns 0.0945 ns 8.062 ns 1.14 0.05 3
HashCode 15 19.204 ns 0.3502 ns 0.3276 ns 19.162 ns 1.00 0.00 1
HashCodeNeo 15 19.744 ns 0.3940 ns 0.3290 ns 19.788 ns 1.03 0.03 2
HashCodeNeoUnaligned 15 21.063 ns 0.3948 ns 0.3297 ns 21.092 ns 1.09 0.02 3
HashCode 16 15.327 ns 0.3363 ns 0.2626 ns 15.349 ns 1.00 0.00 2
HashCodeNeo 16 8.100 ns 0.1721 ns 0.1609 ns 8.065 ns 0.53 0.01 1
HashCodeNeoUnaligned 16 17.034 ns 0.3768 ns 0.5639 ns 16.981 ns 1.13 0.05 3
HashCode 127 105.052 ns 2.1090 ns 3.2834 ns 104.696 ns 1.00 0.00 3
HashCodeNeo 127 37.389 ns 0.6605 ns 0.5855 ns 37.213 ns 0.35 0.01 1
HashCodeNeoUnaligned 127 38.687 ns 0.7336 ns 0.6862 ns 38.650 ns 0.36 0.01 2
HashCode 128 106.665 ns 2.4307 ns 7.1669 ns 106.236 ns 1.00 0.00 3
HashCodeNeo 128 28.700 ns 0.6030 ns 0.6452 ns 28.573 ns 0.28 0.02 1
HashCodeNeoUnaligned 128 34.053 ns 1.0040 ns 2.8152 ns 32.880 ns 0.32 0.04 2
HashCode 1023 728.836 ns 14.1223 ns 16.2633 ns 724.195 ns 1.00 0.00 3
HashCodeNeo 1023 180.836 ns 3.2205 ns 4.6187 ns 180.949 ns 0.25 0.01 2
HashCodeNeoUnaligned 1023 176.570 ns 3.0476 ns 2.7017 ns 175.528 ns 0.24 0.01 1
HashCode 1024 708.242 ns 14.1710 ns 29.2656 ns 693.824 ns 1.00 0.00 2
HashCodeNeo 1024 174.507 ns 2.6961 ns 2.2513 ns 174.225 ns 0.24 0.01 1
HashCodeNeoUnaligned 1024 176.265 ns 2.2253 ns 1.8583 ns 176.354 ns 0.24 0.01 1

I ran them by copy-pasting the changed HashCode class to a file in my benchmark project, and comparing it to .NET's HashCode. The Unaligned benchmarks first add an integer before calling AddBytes (to test the impact of the additional logic that empties the HashCode's cache.

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jun 1, 2022
@ghost
Copy link

ghost commented Jun 1, 2022

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

The HashCode.AddBytes method is optimized to read four bytes at a time and feed them to the private HashCode.Add(int) method, but this method is not that much optimized in processing large amounts of input; because the xxHash algorithm works in batches of 16 bytes, this implementation has to queue the integers in separate fields until four of them are accumulated and then update the hash's state, and that whole logic has a couple of branches.

This PR avoids this queueing logic, instead reading input and directly updating the hash's state in batches of 16 bytes, if the input is large enough.

Benchmarks show significant improvements:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1706 (21H2)
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=7.0.100-preview.5.22267.11
  [Host]     : .NET 7.0.0 (7.0.22.26611), X64 RyuJIT
  DefaultJob : .NET 7.0.0 (7.0.22.26611), X64 RyuJIT
Method Size Mean Error StdDev Median Ratio RatioSD Rank
HashCode 1 7.030 ns 0.1780 ns 0.2496 ns 6.974 ns 1.00 0.00 2
HashCodeNeo 1 6.266 ns 0.1628 ns 0.3993 ns 6.133 ns 0.90 0.05 1
HashCodeNeoUnaligned 1 8.613 ns 0.2078 ns 0.3236 ns 8.581 ns 1.23 0.06 3
HashCode 4 6.838 ns 0.1740 ns 0.3891 ns 6.750 ns 1.00 0.00 2
HashCodeNeo 4 6.412 ns 0.1057 ns 0.0988 ns 6.374 ns 0.90 0.03 1
HashCodeNeoUnaligned 4 8.074 ns 0.1010 ns 0.0945 ns 8.062 ns 1.14 0.05 3
HashCode 15 19.204 ns 0.3502 ns 0.3276 ns 19.162 ns 1.00 0.00 1
HashCodeNeo 15 19.744 ns 0.3940 ns 0.3290 ns 19.788 ns 1.03 0.03 2
HashCodeNeoUnaligned 15 21.063 ns 0.3948 ns 0.3297 ns 21.092 ns 1.09 0.02 3
HashCode 16 15.327 ns 0.3363 ns 0.2626 ns 15.349 ns 1.00 0.00 2
HashCodeNeo 16 8.100 ns 0.1721 ns 0.1609 ns 8.065 ns 0.53 0.01 1
HashCodeNeoUnaligned 16 17.034 ns 0.3768 ns 0.5639 ns 16.981 ns 1.13 0.05 3
HashCode 127 105.052 ns 2.1090 ns 3.2834 ns 104.696 ns 1.00 0.00 3
HashCodeNeo 127 37.389 ns 0.6605 ns 0.5855 ns 37.213 ns 0.35 0.01 1
HashCodeNeoUnaligned 127 38.687 ns 0.7336 ns 0.6862 ns 38.650 ns 0.36 0.01 2
HashCode 128 106.665 ns 2.4307 ns 7.1669 ns 106.236 ns 1.00 0.00 3
HashCodeNeo 128 28.700 ns 0.6030 ns 0.6452 ns 28.573 ns 0.28 0.02 1
HashCodeNeoUnaligned 128 34.053 ns 1.0040 ns 2.8152 ns 32.880 ns 0.32 0.04 2
HashCode 1023 728.836 ns 14.1223 ns 16.2633 ns 724.195 ns 1.00 0.00 3
HashCodeNeo 1023 180.836 ns 3.2205 ns 4.6187 ns 180.949 ns 0.25 0.01 2
HashCodeNeoUnaligned 1023 176.570 ns 3.0476 ns 2.7017 ns 175.528 ns 0.24 0.01 1
HashCode 1024 708.242 ns 14.1710 ns 29.2656 ns 693.824 ns 1.00 0.00 2
HashCodeNeo 1024 174.507 ns 2.6961 ns 2.2513 ns 174.225 ns 0.24 0.01 1
HashCodeNeoUnaligned 1024 176.265 ns 2.2253 ns 1.8583 ns 176.354 ns 0.24 0.01 1
Author: teo-tsirpanis
Assignees: -
Labels:

area-System.Runtime, community-contribution

Milestone: -

@teo-tsirpanis
Copy link
Contributor Author

@stephentoub or somebody else, can you please take a look?

Co-authored-by: Tanner Gooding <tagoo@outlook.com>
@teo-tsirpanis
Copy link
Contributor Author

Thanks for the guidance on Discord @tannergooding, can you take another look?

@teo-tsirpanis
Copy link
Contributor Author

I am investigating the test failure.

@teo-tsirpanis
Copy link
Contributor Author

Test failures are unrelated.

Co-authored-by: Stephen Toub <stoub@microsoft.com>
Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@stephentoub stephentoub merged commit c36e26d into dotnet:main Jun 29, 2022
@teo-tsirpanis teo-tsirpanis deleted the hashcode-addbytes-opt branch June 29, 2022 18:42
@ghost ghost locked as resolved and limited conversation to collaborators Jul 29, 2022
teo-tsirpanis referenced this pull request Aug 10, 2022
It's slightly cheaper to call AddBytes before Add(int).
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Runtime community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants