Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64 - Always morph GT_MOD #68885

Merged
merged 46 commits into from
Jun 11, 2022
Merged

ARM64 - Always morph GT_MOD #68885

merged 46 commits into from
Jun 11, 2022

Conversation

TIHan
Copy link
Contributor

@TIHan TIHan commented May 5, 2022

Should resolve these regressions:

Description

We only want to do the specific ARM64 mod optimization if the morphed Mod to SubMulDiv did not take advantage of CSE. The only phase, that I know of, to do any kind of transformation that occurs after CSE is 'lowering'.

The idea here is to find the SubMulDiv in lowering, turn it back into a Mod, and then call LowerModPow2 to do the specific optimization.

But it gets more complicated. SubMulDiv itself gets optimized in lowering as well, so the shape of SubMulDiv is lost. Therefore, we need to find other possible shapes that get lowered from SubMulDiv.

Acceptance Criteria

@ghost ghost assigned TIHan May 5, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 5, 2022
@ghost
Copy link

ghost commented May 5, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Should resolve these regressions:

Description

Currently a draft.

We only want to do the specific ARM64 mod optimization if the morphed Mod to SubMulDiv did not take advantage of CSE. The only phase, that I know of, to do any kind of transformation that occurs after CSE is 'lowering'.

The idea here is to find the SubMulDiv in lowering, turn it back into a Mod, and then call LowerModPow2 to do the specific optimization.

But it gets more complicated. SubMulDiv itself gets optimized in lowering as well, so the shape of SubMulDiv is lost. Therefore, we need to find other possible shapes that get lowered from SubMulDiv.

Acceptance Criteria

  • Add Fuzzlyn regression tests
Author: TIHan
Assignees: TIHan
Labels:

area-CodeGen-coreclr

Milestone: -

@TIHan TIHan marked this pull request as ready for review May 5, 2022 03:32
@TIHan
Copy link
Contributor Author

TIHan commented May 6, 2022

@kunalspathak - I think this is handling those regressions we saw earlier with the use of csnegs.

Looking at the asmdiffs now, it does show a lot of regressions, but they are caused by the additional divbyzero and overflow check:

G_M60555_IG04:        ; gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
            bl      CORINFO_HELP_OVERFLOW
						;; size=4 bbWeight=0    PerfScore 0.00
G_M60555_IG05:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
            bl      CORINFO_HELP_THROWDIVZERO
            brk_windows #0
						;; size=8 bbWeight=0    PerfScore 0.00

These were here before we merged csnegs so it's just putting them back. I would say these are not regressions.

@jakobbotsch
Copy link
Member

I think recognizing patterns this large in lowering is too fragile and hard to get right.
How do we end up with these patterns? E.g. I see (x >> k) << k -- why is it not x & ~((1 << k) - 1) at this point, and if there is a good reason, can we add this transformation separately instead? Similarly to the other patterns we see here, is it possible to optimize them on their own instead, with the idea that hopefully we end up with something simpler to recognize/optimized on its own?

@jakobbotsch
Copy link
Member

It might also be reasonable to introduce additional transformations as part of rationalization (or separately, with higher TP cost). We also have #68103 where it would be good to have. Thoughts @dotnet/jit-contrib?

@BruceForstall
Copy link
Member

I think recognizing patterns this large in lowering is too fragile and hard to get right.

I definitely agree with this. I don't know if there is a better option (why can't it be done in morph?), though.

@TIHan
Copy link
Contributor Author

TIHan commented May 11, 2022

I think recognizing patterns this large in lowering is too fragile and hard to get right.

I agree as well. Originally, I only wanted to find the SubMulDiv pattern. But at that point in lowering, that is what SubMulDiv turns into.

This specific optimization shouldn't be done in morph, even if we had the constructs to do so. The reason why is because the transformation of a % b to a - (a / b) * b - we want to take advantage of a / b for CSE. There is code out there with this pattern, which is used when you want to mark bits:

let x = index / 16
let y = index % 16

which would turn into:

let x = index / 16
let y = index - (index / 16) * 16

then:

let cse = index / 16
let y = index - cse * 16

At the moment, we are not taking advantage of CSE because we merged in the csneg optimization.

What I really want to do is look for SubMulDiv after CSE is done and before SubMulDiv gets lowered to the much larger pattern. Where is the best place to do that? Do we need to introduce a new phase to achieve that?

@jakobbotsch
Copy link
Member

Where is the best place to do that? Do we need to introduce a new phase to achieve that?

I would be interested to see a prototype of the suggestions I had above. I.e. try it as part of rationalization (in pre-order there), and try a new late phase (e.g. after range check). We can see how costly the full IR walk will be and evaluate whether having the optimization and future opportunity to do other things in that pass outweighs that cost.

@SingleAccretion
Copy link
Contributor

SingleAccretion commented May 13, 2022

Note that we already have an "simple lowering" pass in our phase order that is a (mostly "empty") full IR walk.

@jakobbotsch
Copy link
Member

Note that we already have an "simple lowering" pass in our phase order that is a (mostly "empty") full IR walk.

It's after rationalization however, so folding transformations still require interference checking here (or otherwise coupling it to rationalization).

@SingleAccretion
Copy link
Contributor

Yep, the idea would be to move it to before rationalization. Tree walks are still costlier than linear traversals, but not "a new full IR walk" costlier at least.

@TIHan TIHan changed the title Always morph GT_MOD for ARM64 ARM64 - Always morph GT_MOD, and added optimization for i % 2 Jun 9, 2022
@@ -104,6 +104,7 @@ bool Lowering::IsContainableImmed(GenTree* parentNode, GenTree* childNode) const
case GT_LE:
case GT_GE:
case GT_GT:
case GT_CMP:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to add this so that GT_CMP's second operand can be properly contained.

@TIHan
Copy link
Contributor Author

TIHan commented Jun 10, 2022

@dotnet/jit-contrib this is ready again, CI is green now

@TIHan
Copy link
Contributor Author

TIHan commented Jun 10, 2022

Diffs

@jakobbotsch
Copy link
Member

I'd suggest to split the new optimization into a separate PR so we can review it and track its impact separately.

@TIHan TIHan mentioned this pull request Jun 10, 2022
1 task
@TIHan TIHan changed the title ARM64 - Always morph GT_MOD, and added optimization for i % 2 ARM64 - Always morph GT_MOD Jun 10, 2022
@TIHan
Copy link
Contributor Author

TIHan commented Jun 10, 2022

@jakobbotsch I split them out


return cc->gtNext;
ContainCheckNode(mod);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes necessary or should they be done in the new PR too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of? I got rid of the use. New PR extends this path anyway with the new instruction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is unrelated to the fix, right?
I am fine with leaving it to avoid more churn on this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unrelated, been trying different ways to shape this function. The next PR is the shape and impl I'm happy with.

@TIHan TIHan merged commit bfa0ac0 into dotnet:main Jun 11, 2022
@TIHan TIHan deleted the mod-opt-fix branch June 11, 2022 19:59
@ghost ghost locked as resolved and limited conversation to collaborators Jul 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants