Optimize `HashCode.AddBytes` for inputs larger than 16 bytes. #70095

teo-tsirpanis · 2022-06-01T17:16:19Z

The HashCode.AddBytes method is optimized to read four bytes at a time and feed them to the private HashCode.Add(int) method, but this method is not that much optimized in processing large amounts of input; because the xxHash algorithm works in batches of 16 bytes, this implementation has to queue the integers in separate fields until four of them are accumulated and then update the hash's state, and that whole logic has a couple of branches.

This PR avoids this queueing logic, instead reading input and directly updating the hash's state in batches of 16 bytes, if the input is large enough.

Benchmarks show significant improvements:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1706 (21H2)
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=7.0.100-preview.5.22267.11
  [Host]     : .NET 7.0.0 (7.0.22.26611), X64 RyuJIT
  DefaultJob : .NET 7.0.0 (7.0.22.26611), X64 RyuJIT

Method	Size	Mean	Error	StdDev	Median	Ratio	RatioSD	Rank
HashCode	1	7.030 ns	0.1780 ns	0.2496 ns	6.974 ns	1.00	0.00	2
HashCodeNeo	1	6.266 ns	0.1628 ns	0.3993 ns	6.133 ns	0.90	0.05	1
HashCodeNeoUnaligned	1	8.613 ns	0.2078 ns	0.3236 ns	8.581 ns	1.23	0.06	3

HashCode	4	6.838 ns	0.1740 ns	0.3891 ns	6.750 ns	1.00	0.00	2
HashCodeNeo	4	6.412 ns	0.1057 ns	0.0988 ns	6.374 ns	0.90	0.03	1
HashCodeNeoUnaligned	4	8.074 ns	0.1010 ns	0.0945 ns	8.062 ns	1.14	0.05	3

HashCode	15	19.204 ns	0.3502 ns	0.3276 ns	19.162 ns	1.00	0.00	1
HashCodeNeo	15	19.744 ns	0.3940 ns	0.3290 ns	19.788 ns	1.03	0.03	2
HashCodeNeoUnaligned	15	21.063 ns	0.3948 ns	0.3297 ns	21.092 ns	1.09	0.02	3

HashCode	16	15.327 ns	0.3363 ns	0.2626 ns	15.349 ns	1.00	0.00	2
HashCodeNeo	16	8.100 ns	0.1721 ns	0.1609 ns	8.065 ns	0.53	0.01	1
HashCodeNeoUnaligned	16	17.034 ns	0.3768 ns	0.5639 ns	16.981 ns	1.13	0.05	3

HashCode	127	105.052 ns	2.1090 ns	3.2834 ns	104.696 ns	1.00	0.00	3
HashCodeNeo	127	37.389 ns	0.6605 ns	0.5855 ns	37.213 ns	0.35	0.01	1
HashCodeNeoUnaligned	127	38.687 ns	0.7336 ns	0.6862 ns	38.650 ns	0.36	0.01	2

HashCode	128	106.665 ns	2.4307 ns	7.1669 ns	106.236 ns	1.00	0.00	3
HashCodeNeo	128	28.700 ns	0.6030 ns	0.6452 ns	28.573 ns	0.28	0.02	1
HashCodeNeoUnaligned	128	34.053 ns	1.0040 ns	2.8152 ns	32.880 ns	0.32	0.04	2

HashCode	1023	728.836 ns	14.1223 ns	16.2633 ns	724.195 ns	1.00	0.00	3
HashCodeNeo	1023	180.836 ns	3.2205 ns	4.6187 ns	180.949 ns	0.25	0.01	2
HashCodeNeoUnaligned	1023	176.570 ns	3.0476 ns	2.7017 ns	175.528 ns	0.24	0.01	1

HashCode	1024	708.242 ns	14.1710 ns	29.2656 ns	693.824 ns	1.00	0.00	2
HashCodeNeo	1024	174.507 ns	2.6961 ns	2.2513 ns	174.225 ns	0.24	0.01	1
HashCodeNeoUnaligned	1024	176.265 ns	2.2253 ns	1.8583 ns	176.354 ns	0.24	0.01	1

I ran them by copy-pasting the changed HashCode class to a file in my benchmark project, and comparing it to .NET's HashCode. The Unaligned benchmarks first add an integer before calling AddBytes (to test the impact of the additional logic that empties the HashCode's cache.

dotnet-issue-labeler · 2022-06-01T17:16:25Z

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

ghost · 2022-06-01T17:16:56Z

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

The HashCode.AddBytes method is optimized to read four bytes at a time and feed them to the private HashCode.Add(int) method, but this method is not that much optimized in processing large amounts of input; because the xxHash algorithm works in batches of 16 bytes, this implementation has to queue the integers in separate fields until four of them are accumulated and then update the hash's state, and that whole logic has a couple of branches.

This PR avoids this queueing logic, instead reading input and directly updating the hash's state in batches of 16 bytes, if the input is large enough.

Benchmarks show significant improvements:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1706 (21H2)
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=7.0.100-preview.5.22267.11
  [Host]     : .NET 7.0.0 (7.0.22.26611), X64 RyuJIT
  DefaultJob : .NET 7.0.0 (7.0.22.26611), X64 RyuJIT

Method	Size	Mean	Error	StdDev	Median	Ratio	RatioSD	Rank
HashCode	1	7.030 ns	0.1780 ns	0.2496 ns	6.974 ns	1.00	0.00	2
HashCodeNeo	1	6.266 ns	0.1628 ns	0.3993 ns	6.133 ns	0.90	0.05	1
HashCodeNeoUnaligned	1	8.613 ns	0.2078 ns	0.3236 ns	8.581 ns	1.23	0.06	3

HashCode	4	6.838 ns	0.1740 ns	0.3891 ns	6.750 ns	1.00	0.00	2
HashCodeNeo	4	6.412 ns	0.1057 ns	0.0988 ns	6.374 ns	0.90	0.03	1
HashCodeNeoUnaligned	4	8.074 ns	0.1010 ns	0.0945 ns	8.062 ns	1.14	0.05	3

HashCode	15	19.204 ns	0.3502 ns	0.3276 ns	19.162 ns	1.00	0.00	1
HashCodeNeo	15	19.744 ns	0.3940 ns	0.3290 ns	19.788 ns	1.03	0.03	2
HashCodeNeoUnaligned	15	21.063 ns	0.3948 ns	0.3297 ns	21.092 ns	1.09	0.02	3

HashCode	16	15.327 ns	0.3363 ns	0.2626 ns	15.349 ns	1.00	0.00	2
HashCodeNeo	16	8.100 ns	0.1721 ns	0.1609 ns	8.065 ns	0.53	0.01	1
HashCodeNeoUnaligned	16	17.034 ns	0.3768 ns	0.5639 ns	16.981 ns	1.13	0.05	3

HashCode	127	105.052 ns	2.1090 ns	3.2834 ns	104.696 ns	1.00	0.00	3
HashCodeNeo	127	37.389 ns	0.6605 ns	0.5855 ns	37.213 ns	0.35	0.01	1
HashCodeNeoUnaligned	127	38.687 ns	0.7336 ns	0.6862 ns	38.650 ns	0.36	0.01	2

HashCode	128	106.665 ns	2.4307 ns	7.1669 ns	106.236 ns	1.00	0.00	3
HashCodeNeo	128	28.700 ns	0.6030 ns	0.6452 ns	28.573 ns	0.28	0.02	1
HashCodeNeoUnaligned	128	34.053 ns	1.0040 ns	2.8152 ns	32.880 ns	0.32	0.04	2

HashCode	1023	728.836 ns	14.1223 ns	16.2633 ns	724.195 ns	1.00	0.00	3
HashCodeNeo	1023	180.836 ns	3.2205 ns	4.6187 ns	180.949 ns	0.25	0.01	2
HashCodeNeoUnaligned	1023	176.570 ns	3.0476 ns	2.7017 ns	175.528 ns	0.24	0.01	1

HashCode	1024	708.242 ns	14.1710 ns	29.2656 ns	693.824 ns	1.00	0.00	2
HashCodeNeo	1024	174.507 ns	2.6961 ns	2.2513 ns	174.225 ns	0.24	0.01	1
HashCodeNeoUnaligned	1024	176.265 ns	2.2253 ns	1.8583 ns	176.354 ns	0.24	0.01	1

Author:	teo-tsirpanis
Assignees:	-
Labels:	`area-System.Runtime`, `community-contribution`
Milestone:	-

src/libraries/System.Private.CoreLib/src/System/HashCode.cs

teo-tsirpanis · 2022-06-21T12:06:59Z

@stephentoub or somebody else, can you please take a look?

src/libraries/System.Private.CoreLib/src/System/HashCode.cs

Co-authored-by: Tanner Gooding <tagoo@outlook.com>

teo-tsirpanis · 2022-06-21T22:21:41Z

Thanks for the guidance on Discord @tannergooding, can you take another look?

teo-tsirpanis · 2022-06-22T17:54:26Z

I am investigating the test failure.

src/libraries/System.Private.CoreLib/src/System/HashCode.cs

teo-tsirpanis · 2022-06-23T07:26:41Z

Test failures are unrelated.

src/libraries/System.Private.CoreLib/src/System/HashCode.cs

Co-authored-by: Stephen Toub <stoub@microsoft.com>

stephentoub

Thanks.

It's slightly cheaper to call AddBytes before Add(int).

Optimize HashCode.AddBytes for inputs larger than 16 bytes.

d940f0b

ghost added the community-contribution Indicates that the PR has been added by a community member label Jun 1, 2022

teo-tsirpanis added the area-System.Runtime label Jun 1, 2022

teo-tsirpanis commented Jun 2, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/HashCode.cs Outdated Show resolved Hide resolved

tannergooding reviewed Jun 21, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/HashCode.cs Outdated Show resolved Hide resolved

tannergooding reviewed Jun 21, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/HashCode.cs Outdated Show resolved Hide resolved

Address PR feedback, refactor control flow and inline UnsafeAddMany.

47a1ff6

Co-authored-by: Tanner Gooding <tagoo@outlook.com>

Add more asserts before memory reads and move one inside the main loop.

636564d

tannergooding reviewed Jun 22, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/HashCode.cs Show resolved Hide resolved

runfoapp bot mentioned this pull request Jun 22, 2022

jit.1 work item failing on mono #67888

Closed

stephentoub reviewed Jun 23, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/HashCode.cs Show resolved Hide resolved

stephentoub reviewed Jun 23, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/HashCode.cs Outdated Show resolved Hide resolved

stephentoub reviewed Jun 23, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/HashCode.cs Outdated Show resolved Hide resolved

stephentoub reviewed Jun 23, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/HashCode.cs Show resolved Hide resolved

Address PR feedback.

dd835ad

Co-authored-by: Stephen Toub <stoub@microsoft.com>

stephentoub approved these changes Jun 29, 2022

View reviewed changes

stephentoub merged commit c36e26d into dotnet:main Jun 29, 2022

teo-tsirpanis deleted the hashcode-addbytes-opt branch June 29, 2022 18:42

ghost locked as resolved and limited conversation to collaborators Jul 29, 2022

teo-tsirpanis referenced this pull request Aug 10, 2022

Fix ordering of adding in Regex's BitVector.GetHashCode (#71913)

0767860

It's slightly cheaper to call AddBytes before Add(int).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `HashCode.AddBytes` for inputs larger than 16 bytes. #70095

Optimize `HashCode.AddBytes` for inputs larger than 16 bytes. #70095

teo-tsirpanis commented Jun 1, 2022 •

edited

Loading

dotnet-issue-labeler bot commented Jun 1, 2022

ghost commented Jun 1, 2022

teo-tsirpanis commented Jun 21, 2022

teo-tsirpanis commented Jun 21, 2022

teo-tsirpanis commented Jun 22, 2022

teo-tsirpanis commented Jun 23, 2022

stephentoub left a comment

Optimize HashCode.AddBytes for inputs larger than 16 bytes. #70095

Optimize HashCode.AddBytes for inputs larger than 16 bytes. #70095

Conversation

teo-tsirpanis commented Jun 1, 2022 • edited Loading

dotnet-issue-labeler bot commented Jun 1, 2022

ghost commented Jun 1, 2022

teo-tsirpanis commented Jun 21, 2022

teo-tsirpanis commented Jun 21, 2022

teo-tsirpanis commented Jun 22, 2022

teo-tsirpanis commented Jun 23, 2022

stephentoub left a comment

Choose a reason for hiding this comment

Optimize `HashCode.AddBytes` for inputs larger than 16 bytes. #70095

Optimize `HashCode.AddBytes` for inputs larger than 16 bytes. #70095

teo-tsirpanis commented Jun 1, 2022 •

edited

Loading