Optimize XxHash3 on ARM platform #77881

xoofx · 2022-11-04T07:11:25Z

Hey there,
This PR Fixes ARM performance for XxHash3, followup of #77756

It is roughly 2.5x times faster than the previous version and 15% faster than the C++ version.

BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22621.755)
Snapdragon Compute Platform, 1 CPU, 8 logical and 8 physical cores
.NET SDK=7.0.100-rc.2.22477.23
  [Host]     : .NET 7.0.0 (7.0.22.47203), Arm64 RyuJIT AdvSIMD
  DefaultJob : .NET 7.0.0 (7.0.22.47203), Arm64 RyuJIT AdvSIMD


|           Method |          data |       Mean |    Error |   StdDev | Ratio |
|----------------- |-------------- |-----------:|---------:|---------:|------:|
|       XXH3Native | Byte[1048576] |  70.01 us  | 0.051 us | 0.045 us |  0.45 |
|             XXH3 | Byte[1048576] |  156.05 us | 0.089 us | 0.074 us |  1.0  |
| XXH3ARMOptimized | Byte[1048576] |  61.49 us  | 0.068 us | 0.064 us |  0.39 |

cc: @stephentoub, @EgorBo

ghost · 2022-11-04T07:11:36Z

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Hey there,
This PR Fixes ARM performance for XxHash3, followup of #77756

It is roughly 2.5x times faster than the previous version and 15% faster than the C++ version.

BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22621.755)
Snapdragon Compute Platform, 1 CPU, 8 logical and 8 physical cores
.NET SDK=7.0.100-rc.2.22477.23
  [Host]     : .NET 7.0.0 (7.0.22.47203), Arm64 RyuJIT AdvSIMD
  DefaultJob : .NET 7.0.0 (7.0.22.47203), Arm64 RyuJIT AdvSIMD


|           Method |          data |       Mean |    Error |   StdDev | Ratio |
|----------------- |-------------- |-----------:|---------:|---------:|------:|
|       XXH3Native | Byte[1048576] |  70.01 us  | 0.051 us | 0.045 us |  0.45 |
|             XXH3 | Byte[1048576] |  156.05 us | 0.089 us | 0.074 us |  1.0  |
| XXH3ARMOptimized | Byte[1048576] |  61.49 us  | 0.068 us | 0.064 us |  0.39 |

cc: @stephentoub, @EgorBo

Author:	xoofx
Assignees:	-
Labels:	`area-System.IO`, `community-contribution`
Milestone:	-

EgorBo · 2022-11-04T12:28:40Z

Nice! 🙂

15% faster than the C++ version.

🚀

src/libraries/System.IO.Hashing/src/System/IO/Hashing/XxHash3.cs

stephentoub

Thanks.

Co-authored-by: Stephen Toub <stoub@microsoft.com>

stephentoub · 2022-11-05T15:05:18Z

Thanks

xtqqczze · 2022-11-05T20:29:45Z

src/libraries/System.IO.Hashing/src/System/IO/Hashing/XxHash3.cs

@@ -896,16 +897,31 @@ private static Vector128<ulong> Accumulate128(Vector128<ulong> accVec, byte* sou
            Vector128<uint> sourceKey = sourceVec ^ secret;

            // TODO: Figure out how to unwind this shuffle and just use Vector128.Multiply


Is this comment still relevant, or does it now refer to code in MultiplyWideningLower?

Optimize XxHash3 on ARM platform

0f60e44

dotnet-issue-labeler bot added the area-System.IO label Nov 4, 2022

ghost added the community-contribution Indicates that the PR has been added by a community member label Nov 4, 2022

EgorBo reviewed Nov 4, 2022

View reviewed changes

src/libraries/System.IO.Hashing/src/System/IO/Hashing/XxHash3.cs Outdated Show resolved Hide resolved

EgorBo reviewed Nov 4, 2022

View reviewed changes

src/libraries/System.IO.Hashing/src/System/IO/Hashing/XxHash3.cs Outdated Show resolved Hide resolved

EgorBo reviewed Nov 4, 2022

View reviewed changes

src/libraries/System.IO.Hashing/src/System/IO/Hashing/XxHash3.cs Outdated Show resolved Hide resolved

Extract code to MultiplyWideningLower

6b07c09

stephentoub reviewed Nov 4, 2022

View reviewed changes

src/libraries/System.IO.Hashing/src/System/IO/Hashing/XxHash3.cs Outdated Show resolved Hide resolved

stephentoub approved these changes Nov 4, 2022

View reviewed changes

Update src/libraries/System.IO.Hashing/src/System/IO/Hashing/XxHash3.cs

1a310cc

Co-authored-by: Stephen Toub <stoub@microsoft.com>

xoofx mentioned this pull request Nov 5, 2022

Add XxHash128 #77944

Merged

stephentoub merged commit ec26662 into dotnet:main Nov 5, 2022

xtqqczze reviewed Nov 5, 2022

View reviewed changes

adamsitnik added this to the 8.0.0 milestone Nov 7, 2022

adamsitnik added the tenet-performance Performance related issue label Nov 7, 2022

ghost locked as resolved and limited conversation to collaborators Dec 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize XxHash3 on ARM platform #77881

Optimize XxHash3 on ARM platform #77881

xoofx commented Nov 4, 2022

ghost commented Nov 4, 2022

EgorBo commented Nov 4, 2022 •

edited

Loading

stephentoub left a comment

stephentoub commented Nov 5, 2022

xtqqczze Nov 5, 2022

		@@ -896,16 +897,31 @@ private static Vector128<ulong> Accumulate128(Vector128<ulong> accVec, byte* sou
		Vector128<uint> sourceKey = sourceVec ^ secret;

		// TODO: Figure out how to unwind this shuffle and just use Vector128.Multiply

Optimize XxHash3 on ARM platform #77881

Optimize XxHash3 on ARM platform #77881

Conversation

xoofx commented Nov 4, 2022

ghost commented Nov 4, 2022

EgorBo commented Nov 4, 2022 • edited Loading

stephentoub left a comment

Choose a reason for hiding this comment

stephentoub commented Nov 5, 2022

xtqqczze Nov 5, 2022

Choose a reason for hiding this comment

EgorBo commented Nov 4, 2022 •

edited

Loading