ARM64 - Always morph GT_MOD #68885

TIHan · 2022-05-05T03:11:35Z

Should resolve these regressions:

ARM64 - Minor regression from optimizing 'a % b' #67983 - more info about it here: ARM64 - Optimizing a % b operations part 2 #66407 (comment)
~~Assertion failed '!"Shouldn't see an integer typed GT_MOD node in ARM64"' during 'Linear scan register alloc' #68470~~ Seems to not reproduce in main anymore.
Assertion failed 'divMod->OperGet() != GT_UMOD' during 'Lowering nodeinfo' #68136

Description

We only want to do the specific ARM64 mod optimization if the morphed Mod to SubMulDiv did not take advantage of CSE. The only phase, that I know of, to do any kind of transformation that occurs after CSE is 'lowering'.

The idea here is to find the SubMulDiv in lowering, turn it back into a Mod, and then call LowerModPow2 to do the specific optimization.

But it gets more complicated. SubMulDiv itself gets optimized in lowering as well, so the shape of SubMulDiv is lost. Therefore, we need to find other possible shapes that get lowered from SubMulDiv.

Acceptance Criteria

Add Fuzzlyn regression tests
Merge Creating comma temps differently for SubMulDiv morph #69770

…y lowered GT_MOD.

ghost · 2022-05-05T03:11:42Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Should resolve these regressions:

ARM64 - Minor regression from optimizing 'a % b' #67983 - more info about it here: ARM64 - Optimizing a % b operations part 2 #66407 (comment)
[Perf] ARM64 regression in System.Diagnostics.Perf_Activity.ActivityAllocations #68624
Assertion failed '!"Shouldn't see an integer typed GT_MOD node in ARM64"' during 'Linear scan register alloc' #68470
Assertion failed 'divMod->OperGet() != GT_UMOD' during 'Lowering nodeinfo' #68136

Description

Currently a draft.

We only want to do the specific ARM64 mod optimization if the morphed Mod to SubMulDiv did not take advantage of CSE. The only phase, that I know of, to do any kind of transformation that occurs after CSE is 'lowering'.

The idea here is to find the SubMulDiv in lowering, turn it back into a Mod, and then call LowerModPow2 to do the specific optimization.

But it gets more complicated. SubMulDiv itself gets optimized in lowering as well, so the shape of SubMulDiv is lost. Therefore, we need to find other possible shapes that get lowered from SubMulDiv.

Acceptance Criteria

Add Fuzzlyn regression tests

Author:	TIHan
Assignees:	TIHan
Labels:	`area-CodeGen-coreclr`
Milestone:	-

…s constant non-zero value

…xceptions

TIHan · 2022-05-06T01:00:14Z

@kunalspathak - I think this is handling those regressions we saw earlier with the use of csnegs.

Looking at the asmdiffs now, it does show a lot of regressions, but they are caused by the additional divbyzero and overflow check:

G_M60555_IG04:        ; gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
            bl      CORINFO_HELP_OVERFLOW
						;; size=4 bbWeight=0    PerfScore 0.00
G_M60555_IG05:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
            bl      CORINFO_HELP_THROWDIVZERO
            brk_windows #0
						;; size=8 bbWeight=0    PerfScore 0.00

These were here before we merged csnegs so it's just putting them back. I would say these are not regressions.

jakobbotsch · 2022-05-06T20:36:32Z

I think recognizing patterns this large in lowering is too fragile and hard to get right.
How do we end up with these patterns? E.g. I see (x >> k) << k -- why is it not x & ~((1 << k) - 1) at this point, and if there is a good reason, can we add this transformation separately instead? Similarly to the other patterns we see here, is it possible to optimize them on their own instead, with the idea that hopefully we end up with something simpler to recognize/optimized on its own?

jakobbotsch · 2022-05-06T20:51:16Z

It might also be reasonable to introduce additional transformations as part of rationalization (or separately, with higher TP cost). We also have #68103 where it would be good to have. Thoughts @dotnet/jit-contrib?

BruceForstall · 2022-05-08T00:14:26Z

I think recognizing patterns this large in lowering is too fragile and hard to get right.

I definitely agree with this. I don't know if there is a better option (why can't it be done in morph?), though.

TIHan · 2022-05-11T23:30:24Z

I think recognizing patterns this large in lowering is too fragile and hard to get right.

I agree as well. Originally, I only wanted to find the SubMulDiv pattern. But at that point in lowering, that is what SubMulDiv turns into.

This specific optimization shouldn't be done in morph, even if we had the constructs to do so. The reason why is because the transformation of a % b to a - (a / b) * b - we want to take advantage of a / b for CSE. There is code out there with this pattern, which is used when you want to mark bits:

let x = index / 16
let y = index % 16

which would turn into:

let x = index / 16
let y = index - (index / 16) * 16

then:

let cse = index / 16
let y = index - cse * 16

At the moment, we are not taking advantage of CSE because we merged in the csneg optimization.

What I really want to do is look for SubMulDiv after CSE is done and before SubMulDiv gets lowered to the much larger pattern. Where is the best place to do that? Do we need to introduce a new phase to achieve that?

jakobbotsch · 2022-05-13T18:58:23Z

Where is the best place to do that? Do we need to introduce a new phase to achieve that?

I would be interested to see a prototype of the suggestions I had above. I.e. try it as part of rationalization (in pre-order there), and try a new late phase (e.g. after range check). We can see how costly the full IR walk will be and evaluate whether having the optimization and future opportunity to do other things in that pass outweighs that cost.

SingleAccretion · 2022-05-13T19:12:12Z

Note that we already have an "simple lowering" pass in our phase order that is a (mostly "empty") full IR walk.

jakobbotsch · 2022-05-13T19:22:48Z

Note that we already have an "simple lowering" pass in our phase order that is a (mostly "empty") full IR walk.

It's after rationalization however, so folding transformations still require interference checking here (or otherwise coupling it to rationalization).

SingleAccretion · 2022-05-13T19:31:15Z

Yep, the idea would be to move it to before rationalization. Tree walks are still costlier than linear traversals, but not "a new full IR walk" costlier at least.

src/coreclr/jit/rationalize.cpp

TIHan · 2022-06-09T19:25:10Z

src/coreclr/jit/lowerarmarch.cpp

@@ -104,6 +104,7 @@ bool Lowering::IsContainableImmed(GenTree* parentNode, GenTree* childNode) const
            case GT_LE:
            case GT_GE:
            case GT_GT:
+            case GT_CMP:


Needed to add this so that GT_CMP's second operand can be properly contained.

TIHan · 2022-06-10T15:17:19Z

@dotnet/jit-contrib this is ready again, CI is green now

TIHan · 2022-06-10T19:03:05Z

Diffs

jakobbotsch · 2022-06-10T20:34:07Z

I'd suggest to split the new optimization into a separate PR so we can review it and track its impact separately.

src/coreclr/jit/rationalize.cpp

This reverts commit 80635b0.

TIHan · 2022-06-10T22:43:20Z

@jakobbotsch I split them out

jakobbotsch · 2022-06-11T05:41:38Z

src/coreclr/jit/lowerarmarch.cpp


-    return cc->gtNext;
+    ContainCheckNode(mod);


Are these changes necessary or should they be done in the new PR too?

Sort of? I got rid of the use. New PR extends this path anyway with the new instruction.

But this is unrelated to the fix, right?
I am fine with leaving it to avoid more churn on this PR.

It is unrelated, been trying different ways to shape this function. The next PR is the shape and impl I'm happy with.

src/coreclr/jit/morph.cpp

Always morph GT_MOD for ARM64. Added lowering optimization for a full…

f1e7232

…y lowered GT_MOD.

ghost assigned TIHan May 5, 2022

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 5, 2022

TIHan marked this pull request as ready for review May 5, 2022 03:32

TIHan added 7 commits May 4, 2022 20:36

Remove space

8205969

Match cases with mod 2 and long types

98a4350

Minor tweak

dd8556f

Minor tweak

b3c967e

Minor tweak

9b35e79

Do not add overflow/throwdivzero to the block if the second operand i…

2517c60

…s constant non-zero value

Lets not modify whether or not we should have overflow/throwdivzero e…

c0b8b1a

…xceptions

TIHan mentioned this pull request May 12, 2022

ARM64 - Do not emit possible DIV_BY_ZERO/OVERFLOW exception blocks for non-zero/non-(negative one) constants #68945

Merged

1 task

Merge branch 'main' into mod-opt-fix

e19661c

TIHan added 6 commits May 16, 2022 13:17

Perform the transformation in pre-order rationalization

d989b10

Merge branch 'mod-opt-fix' of github.com:TIHan/runtime into mod-opt-fix

44065c0

Fix cns equality check

7c43a37

Fixing build

9c03bc6

Fixing build

6b3b7e5

Fixing build

574c17d

runfoapp bot mentioned this pull request May 17, 2022

RunContinueWithStressTestsNoState timing out in CI #2271

Closed

jakobbotsch reviewed Jun 6, 2022

View reviewed changes

src/coreclr/jit/rationalize.cpp Outdated Show resolved Hide resolved

JulieLeeMSFT mentioned this pull request Jun 6, 2022

ARM64: Optimize a % b operation #34937

Closed

Fixing build

6ccee6f

jakobbotsch mentioned this pull request Jun 7, 2022

JIT: Invalid results/assertion errors with modulo ops #70333

Closed

TIHan added 5 commits June 7, 2022 11:03

Fixing build

fcf9517

Merge remote-tracking branch 'upstream/main' into mod-opt-fix

68d8df9

Merge remote-tracking branch 'upstream/main' into mod-opt-fix

e9d332f

Added GT_CNEG_LT for ARM64 LIR to handle mod 2

0452912

Added GT_CMP case for checking valid imm

80635b0

TIHan mentioned this pull request Jun 9, 2022

[Perf] ARM64 regression in System.Diagnostics.Perf_Activity.ActivityAllocations #68624

Closed

TIHan changed the title ~~Always morph GT_MOD for ARM64~~ ARM64 - Always morph GT_MOD, and added optimization for i % 2 Jun 9, 2022

TIHan commented Jun 9, 2022

View reviewed changes

Formatting

edeb5b1

jakobbotsch reviewed Jun 10, 2022

View reviewed changes

src/coreclr/jit/rationalize.cpp Outdated Show resolved Hide resolved

TIHan added 3 commits June 10, 2022 15:36

Revert "Added GT_CMP case for checking valid imm"

61204af

This reverts commit 80635b0.

Reverting

61c3f81

Do not bail on reverse

6038298

TIHan mentioned this pull request Jun 10, 2022

ARM64 - Optimize i % 2 #70599

Merged

1 task

TIHan changed the title ~~ARM64 - Always morph GT_MOD, and added optimization for i % 2~~ ARM64 - Always morph GT_MOD Jun 10, 2022

jakobbotsch reviewed Jun 11, 2022

View reviewed changes

src/coreclr/jit/morph.cpp Show resolved Hide resolved

jakobbotsch approved these changes Jun 11, 2022

View reviewed changes

TIHan merged commit bfa0ac0 into dotnet:main Jun 11, 2022

TIHan deleted the mod-opt-fix branch June 11, 2022 19:59

ghost locked as resolved and limited conversation to collaborators Jul 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM64 - Always morph GT_MOD #68885

ARM64 - Always morph GT_MOD #68885

TIHan commented May 5, 2022 •

edited

Loading

ghost commented May 5, 2022

TIHan commented May 6, 2022

jakobbotsch commented May 6, 2022

jakobbotsch commented May 6, 2022

BruceForstall commented May 8, 2022

TIHan commented May 11, 2022 •

edited

Loading

jakobbotsch commented May 13, 2022

SingleAccretion commented May 13, 2022 •

edited

Loading

jakobbotsch commented May 13, 2022

SingleAccretion commented May 13, 2022

TIHan Jun 9, 2022

TIHan commented Jun 10, 2022

TIHan commented Jun 10, 2022

jakobbotsch commented Jun 10, 2022

TIHan commented Jun 10, 2022

jakobbotsch Jun 11, 2022

TIHan Jun 11, 2022

jakobbotsch Jun 11, 2022

TIHan Jun 11, 2022

ARM64 - Always morph GT_MOD #68885

ARM64 - Always morph GT_MOD #68885

Conversation

TIHan commented May 5, 2022 • edited Loading

ghost commented May 5, 2022

TIHan commented May 6, 2022

jakobbotsch commented May 6, 2022

jakobbotsch commented May 6, 2022

BruceForstall commented May 8, 2022

TIHan commented May 11, 2022 • edited Loading

jakobbotsch commented May 13, 2022

SingleAccretion commented May 13, 2022 • edited Loading

jakobbotsch commented May 13, 2022

SingleAccretion commented May 13, 2022

TIHan Jun 9, 2022

Choose a reason for hiding this comment

TIHan commented Jun 10, 2022

TIHan commented Jun 10, 2022

jakobbotsch commented Jun 10, 2022

TIHan commented Jun 10, 2022

jakobbotsch Jun 11, 2022

Choose a reason for hiding this comment

TIHan Jun 11, 2022

Choose a reason for hiding this comment

jakobbotsch Jun 11, 2022

Choose a reason for hiding this comment

TIHan Jun 11, 2022

Choose a reason for hiding this comment

TIHan commented May 5, 2022 •

edited

Loading

TIHan commented May 11, 2022 •

edited

Loading

SingleAccretion commented May 13, 2022 •

edited

Loading