Only widen short local stores #74685

SingleAccretion · 2022-08-26T22:56:38Z

As the commit message points out, widening byte stores is not profitable most of the time.

This change is meant to minimize the amount of regressions for the upcoming enablement of folding primitives in local morph.

Diffs.

ghost · 2022-08-26T22:56:50Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

As the commit message points out, widening byte stores is not profitable most of the time.

This change is meant to minimize the amount of regressions for the upcoming enablement of folding primitives in local morph.

Author:	SingleAccretion
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

// Widening byte -> int: // // Zero : 3 -> 4/2 // Non-Zero : 3 -> 6 // // Notable: only the "reused zero" case can become better. // // Widening short -> int: // // Zero : 5 -> 4/2 // Non-Zero : 5 -> 6 (no 0x66 prefix) // // Notable: the zero (common) case is always better, while // for the non-zero case we only regress by one byte and // get rid of the prefix.

SingleAccretion · 2022-08-27T22:08:07Z

@dotnet/jit-contrib

AndyAyersMS · 2022-09-12T19:29:27Z

@jakobbotsch can you review?

jakobbotsch · 2022-09-13T08:59:07Z

/azp run runtime-coreclr superpmi-diffs

azure-pipelines · 2022-09-13T08:59:20Z

Azure Pipelines successfully started running 1 pipeline(s).

jakobbotsch · 2022-09-13T09:27:04Z

src/coreclr/jit/lowerxarch.cpp

+    // Most small locals (the exception is dependently promoted fields) have 4 byte wide stack slots, so
+    // we can widen the store, if profitable. This optimization is not relevant for register candidates.


ARM has the same "optimization" (which does not make much sense on ARM I think). Should it be adjusted? It does not have the lvDoNotEnregister heuristic check, but it probably makes sense to remove it entirely.

ARM

I've investigated this question a while back.

For ARM64 this widening is useless. I'll #ifdef it out and delete the LA64 copy.

For ARM, it can be useful because 4-byte-wide stores can be encoded in 2 bytes in more cases than the small ones. I'll make the code look more like this new x86/x64 version.

It does not have the lvDoNotEnregister heuristic check, but it probably makes sense to remove it entirely.

Yes... Not sure why I put that in; it's true that the "optimization" is only relevant for in-memory locals, but even candidates can end up spilled. I'll remove it.

On LA64, all instructions have the same size of 4 bytes, so there is no point to widening.

No point.

No point in widening on ARM64.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 26, 2022

ghost added the community-contribution Indicates that the PR has been added by a community member label Aug 26, 2022

SingleAccretion force-pushed the Small-Store-Widening branch from dc631a6 to 2931bc5 Compare August 26, 2022 23:17

SingleAccretion force-pushed the Small-Store-Widening branch from 2931bc5 to 89f7931 Compare August 26, 2022 23:45

SingleAccretion changed the title ~~Only widen 2 byte wide local stores~~ Only widen short local stores Aug 26, 2022

SingleAccretion closed this Aug 27, 2022

SingleAccretion reopened this Aug 27, 2022

Perhaps this...?

b94f830

SingleAccretion force-pushed the Small-Store-Widening branch from e39ecb3 to b94f830 Compare August 27, 2022 18:08

SingleAccretion marked this pull request as ready for review August 27, 2022 22:07

JulieLeeMSFT added this to the 8.0.0 milestone Aug 29, 2022

AndyAyersMS requested a review from jakobbotsch September 12, 2022 19:30

jakobbotsch reviewed Sep 13, 2022

View reviewed changes

SingleAccretion added 3 commits September 13, 2022 19:36

Delete the widening from LA64

dd8a33f

On LA64, all instructions have the same size of 4 bytes, so there is no point to widening.

Remove the DNER check

faa925a

No point.

ARM/64 lowering

4a57b89

No point in widening on ARM64.

jakobbotsch approved these changes Sep 14, 2022

View reviewed changes

jakobbotsch merged commit 522f808 into dotnet:main Sep 14, 2022

SingleAccretion deleted the Small-Store-Widening branch September 17, 2022 16:35

ghost locked as resolved and limited conversation to collaborators Oct 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only widen short local stores #74685

Only widen short local stores #74685

SingleAccretion commented Aug 26, 2022 •

edited

Loading

ghost commented Aug 26, 2022

SingleAccretion commented Aug 27, 2022

AndyAyersMS commented Sep 12, 2022

jakobbotsch commented Sep 13, 2022

azure-pipelines bot commented Sep 13, 2022

jakobbotsch Sep 13, 2022

SingleAccretion Sep 13, 2022

		// Most small locals (the exception is dependently promoted fields) have 4 byte wide stack slots, so
		// we can widen the store, if profitable. This optimization is not relevant for register candidates.

Only widen short local stores #74685

Only widen short local stores #74685

Conversation

SingleAccretion commented Aug 26, 2022 • edited Loading

ghost commented Aug 26, 2022

SingleAccretion commented Aug 27, 2022

AndyAyersMS commented Sep 12, 2022

jakobbotsch commented Sep 13, 2022

azure-pipelines bot commented Sep 13, 2022

jakobbotsch Sep 13, 2022

Choose a reason for hiding this comment

SingleAccretion Sep 13, 2022

Choose a reason for hiding this comment

SingleAccretion commented Aug 26, 2022 •

edited

Loading