-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mono] Add Vector128 Sum intrinsic for amd64 #75142
Conversation
I'm nitpicking here. For
The
The resulting code is longer, but has a lower total latency and puts less pressure on Intel's port 5. Still, horizontal add probably won't be executed in an inner loop, so saving 1-2 clocks of latency is not significant. And this would probably have to be measured, too. |
I expect the longer code will have an overall net-negative impact in loops since it takes up 2x the space, produces a 3 instruction dependency chain, and likewise will take up additional micro-ops in the decoder. We also have to be considerate because this can be non-deterministic if you aren't careful. For floating-point, |
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
src/mono/mono/mini/simd-intrinsics.c
Outdated
@@ -545,6 +551,92 @@ emit_sum_vector (MonoCompile *cfg, MonoType *vector_type, MonoTypeEnum element_t | |||
} | |||
#endif | |||
|
|||
#ifdef TARGET_AMD64 | |||
static int type_to_extract_op (MonoTypeEnum type); | |||
static const int fast_log2 [] = { 1, 0, 1, -1, 2, -1, -1, -1, 3 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not simply calculating log2. It seems that you've assigned -1
to places where you think should be illegal element numbers. If that's the case, element number 0 and 1 should be -1
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, -1
should be at illegal inputs which 0 and 1 are as well in this case.
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
Add support for the following Vector128 API's: