Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize XxHash3 on ARM platform #77881

Merged
merged 3 commits into from
Nov 5, 2022
Merged

Conversation

xoofx
Copy link
Member

@xoofx xoofx commented Nov 4, 2022

Hey there,
This PR Fixes ARM performance for XxHash3, followup of #77756

It is roughly 2.5x times faster than the previous version and 15% faster than the C++ version.

BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22621.755)
Snapdragon Compute Platform, 1 CPU, 8 logical and 8 physical cores
.NET SDK=7.0.100-rc.2.22477.23
  [Host]     : .NET 7.0.0 (7.0.22.47203), Arm64 RyuJIT AdvSIMD
  DefaultJob : .NET 7.0.0 (7.0.22.47203), Arm64 RyuJIT AdvSIMD


|           Method |          data |       Mean |    Error |   StdDev | Ratio |
|----------------- |-------------- |-----------:|---------:|---------:|------:|
|       XXH3Native | Byte[1048576] |  70.01 us  | 0.051 us | 0.045 us |  0.45 |
|             XXH3 | Byte[1048576] |  156.05 us | 0.089 us | 0.074 us |  1.0  |
| XXH3ARMOptimized | Byte[1048576] |  61.49 us  | 0.068 us | 0.064 us |  0.39 |

cc: @stephentoub, @EgorBo

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Nov 4, 2022
@ghost
Copy link

ghost commented Nov 4, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Hey there,
This PR Fixes ARM performance for XxHash3, followup of #77756

It is roughly 2.5x times faster than the previous version and 15% faster than the C++ version.

BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22621.755)
Snapdragon Compute Platform, 1 CPU, 8 logical and 8 physical cores
.NET SDK=7.0.100-rc.2.22477.23
  [Host]     : .NET 7.0.0 (7.0.22.47203), Arm64 RyuJIT AdvSIMD
  DefaultJob : .NET 7.0.0 (7.0.22.47203), Arm64 RyuJIT AdvSIMD


|           Method |          data |       Mean |    Error |   StdDev | Ratio |
|----------------- |-------------- |-----------:|---------:|---------:|------:|
|       XXH3Native | Byte[1048576] |  70.01 us  | 0.051 us | 0.045 us |  0.45 |
|             XXH3 | Byte[1048576] |  156.05 us | 0.089 us | 0.074 us |  1.0  |
| XXH3ARMOptimized | Byte[1048576] |  61.49 us  | 0.068 us | 0.064 us |  0.39 |

cc: @stephentoub, @EgorBo

Author: xoofx
Assignees: -
Labels:

area-System.IO, community-contribution

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Nov 4, 2022

Nice! 🙂

15% faster than the C++ version.

🚀

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Co-authored-by: Stephen Toub <stoub@microsoft.com>
@xoofx xoofx mentioned this pull request Nov 5, 2022
@stephentoub stephentoub merged commit ec26662 into dotnet:main Nov 5, 2022
@stephentoub
Copy link
Member

Thanks

@@ -896,16 +897,31 @@ private static Vector128<ulong> Accumulate128(Vector128<ulong> accVec, byte* sou
Vector128<uint> sourceKey = sourceVec ^ secret;

// TODO: Figure out how to unwind this shuffle and just use Vector128.Multiply
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still relevant, or does it now refer to code in MultiplyWideningLower?

@adamsitnik adamsitnik added this to the 8.0.0 milestone Nov 7, 2022
@adamsitnik adamsitnik added the tenet-performance Performance related issue label Nov 7, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO community-contribution Indicates that the PR has been added by a community member tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants