-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only widen short local stores #74685
Only widen short local stores #74685
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsAs the commit message points out, widening byte stores is not profitable most of the time. This change is meant to minimize the amount of regressions for the upcoming enablement of folding primitives in local morph.
|
dc631a6
to
2931bc5
Compare
// Widening byte -> int: // // Zero : 3 -> 4/2 // Non-Zero : 3 -> 6 // // Notable: only the "reused zero" case can become better. // // Widening short -> int: // // Zero : 5 -> 4/2 // Non-Zero : 5 -> 6 (no 0x66 prefix) // // Notable: the zero (common) case is always better, while // for the non-zero case we only regress by one byte and // get rid of the prefix.
2931bc5
to
89f7931
Compare
e39ecb3
to
b94f830
Compare
@dotnet/jit-contrib |
@jakobbotsch can you review? |
/azp run runtime-coreclr superpmi-diffs |
Azure Pipelines successfully started running 1 pipeline(s). |
src/coreclr/jit/lowerxarch.cpp
Outdated
// Most small locals (the exception is dependently promoted fields) have 4 byte wide stack slots, so | ||
// we can widen the store, if profitable. This optimization is not relevant for register candidates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARM has the same "optimization" (which does not make much sense on ARM I think). Should it be adjusted? It does not have the lvDoNotEnregister
heuristic check, but it probably makes sense to remove it entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARM
I've investigated this question a while back.
- For ARM64 this widening is useless. I'll
#ifdef
it out and delete the LA64 copy. - For ARM, it can be useful because 4-byte-wide stores can be encoded in 2 bytes in more cases than the small ones. I'll make the code look more like this new x86/x64 version.
It does not have the lvDoNotEnregister heuristic check, but it probably makes sense to remove it entirely.
Yes... Not sure why I put that in; it's true that the "optimization" is only relevant for in-memory locals, but even candidates can end up spilled. I'll remove it.
On LA64, all instructions have the same size of 4 bytes, so there is no point to widening.
No point.
No point in widening on ARM64.
As the commit message points out, widening byte stores is not profitable most of the time.
This change is meant to minimize the amount of regressions for the upcoming enablement of folding primitives in local morph.
Diffs.