Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating comma temps differently for SubMulDiv morph #69770

Merged
merged 8 commits into from
Jun 1, 2022

Conversation

TIHan
Copy link
Contributor

@TIHan TIHan commented May 25, 2022

Description

This changes how comma temps are created for SubMulDiv morph.

Example code:

(multiplier * seed2 + increment) % Modulus;

In morph, this would turn into:

(let tmp1 = multiplier * seed2 + increment) in tmp1) - (tmp1 / Modulus) * Modulus

This PR morphs it into:

let tmp1 = multiplier * seed2 + increment in
tmp1 - (tmp1 / Modulus) * Modulus

Basically, we are moving the comma temps to be the root of the expression instead of leafs.

Acceptance Criteria

  • CI passes
  • Investigate a possible generic way of lifting GT_COMMAs up to be root nodes instead of leaf nodes.
    • This might be a bit harder to do in a generic way. For now, let's not worry about it.

Diffs

@ghost ghost assigned TIHan May 25, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 25, 2022
@ghost
Copy link

ghost commented May 25, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

At the moment, this is an experiment.

This changes how comma temps are created for SubMulDiv morph.

Example code:

(multiplier * seed2 + increment) % Modulus;

In morph, this would turn into:

(let tmp1 = multiplier * seed2 + increment) in tmp1) - (tmp1 / Modulus) * Modulus

This PR morphs it into:

let tmp1 = multiplier * seed2 + increment in
tmp1 - (tmp1 / Modulus) * Modulus
Author: TIHan
Assignees: TIHan
Labels:

area-CodeGen-coreclr

Milestone: -

@TIHan TIHan marked this pull request as ready for review May 25, 2022 21:44
@TIHan
Copy link
Contributor Author

TIHan commented May 26, 2022

@dotnet/jit-contrib This is ready. The diffs will show some regressions, but they are not detrimental. The improvements on both x64 and ARM64 look good, especially ARM64.

@TIHan TIHan mentioned this pull request May 26, 2022
2 tasks
@TIHan
Copy link
Contributor Author

TIHan commented May 27, 2022

@kunalspathak @AndyAyersMS - I feel good about this change and it's ready. It should help optimizing 'mod' since I'll be able to do the transformation before rationalization.

1 similar comment
@TIHan
Copy link
Contributor Author

TIHan commented May 27, 2022

@kunalspathak @AndyAyersMS - I feel good about this change and it's ready. It should help optimizing 'mod' since I'll be able to do the transformation before rationalization.

src/coreclr/jit/morph.cpp Show resolved Hide resolved
src/coreclr/jit/morph.cpp Show resolved Hide resolved
src/coreclr/jit/morph.cpp Show resolved Hide resolved
Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

(but: you have formatting to fix).

@JulieLeeMSFT
Copy link
Member

The diffs will show some regressions, but they are not detrimental. The improvements on both x64 and ARM64 look good, especially ARM64.

@TIHan can you show the perf results in this PR?

@TIHan
Copy link
Contributor Author

TIHan commented May 31, 2022

Here is the link to the diffs: Diffs

@kunalspathak
Copy link
Member

Seems there is significant regression introduced on arm64:

        1380 (0.88 % of base) : 4937.dasm - u8rem:TestEntryPoint():int
        1340 (0.80 % of base) : 4917.dasm - i8rem:TestEntryPoint():int
         976 (0.59 % of base) : 264436.dasm - u4rem:TestEntryPoint():int
         128 (0.24 % of base) : 16098.dasm - lclfldrem:Main():int
          16 (4.12 % of base) : 173626.dasm - Test_10w5d.testout1:Func_0_1_1_6_2():double
          16 (4.12 % of base) : 176286.dasm - Test_10w5d.testout1:Func_0_1_1_6_2():double
          12 (0.81 % of base) : 58712.dasm - ILGEN_0x13230206:Method_0xce57b468(long,double,byte,long):int
          12 (1.58 % of base) : 76667.dasm - ILGEN_0x3aa9c940:main():int
          12 (0.36 % of base) : 45954.dasm - ILGEN_0x977f9ed2:Method_0xf6b7353b():float
           8 (4.35 % of base) : 78111.dasm - DevDiv_590771:ILGEN_METHOD(long,ushort,long,int,ushort,long):byte
           4 (1.39 % of base) : 77616.dasm - ILGEN_0x152f1077:Method_0x2763af56(long):int

@kunalspathak
Copy link
Member

Seems there is significant regression introduced on arm64:

just noticed your message about the regression in #69770 (comment)

@kunalspathak
Copy link
Member

Seems like we are not CSEing the % operation for this:

    public Packet256Tracer(int width, int height)
    {
        if ((width % VectorPacket256.Packet256Size) != 0)
        {
            width += VectorPacket256.Packet256Size - (width % VectorPacket256.Packet256Size);
        }
        Width = width;
        Height = height;
    }

image

@@ -13833,6 +13877,28 @@ GenTree* Compiler::fgMorphMultiOp(GenTreeMultiOp* multiOp)
// division will be used, in that case this transform allows CSE to
// eliminate the redundant div from code like "x = a / 3; y = a % 3;".
//
// Before:
// * RETURN int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something doesn't feel right in Before: tree. Did it mean to have (V00 * V00) % V01. Same in After:.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(V00 * V00) % V01 is intentional.

In the after tree, you see that it takes (V00 * V00) and hoists it out:

//           +--*  ASG       int
//           |  +--*  LCL_VAR   int    V03 tmp1
//           |  \--*  MUL       int
//           |     +--*  LCL_VAR   int    V00 arg0
//           |     \--*  LCL_VAR   int    V00 arg0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As spoke offline, change MUL to BINOP and add a note that it is applicable for any BINOP that is safe to clone.

@TIHan
Copy link
Contributor Author

TIHan commented Jun 1, 2022

Seems there is significant regression introduced on arm64:

It does look significant, but when I looked at the top regressions, the actual code is really wild with a lot of casts and checked contexts - something you really would not see in user code.

@TIHan
Copy link
Contributor Author

TIHan commented Jun 1, 2022

Seems like we are not CSEing the % operation for this:

That is a good catch. I'm surprised by this one - I'm even more surprised how it was able to CSE it before because one of the ops would have a GT_COMMA in it.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kunalspathak
Copy link
Member

Seems like we are not CSEing

My before and after were flipped. I think it looks good.

@TIHan
Copy link
Contributor Author

TIHan commented Jun 1, 2022

As a note from @kunalspathak 's comment regarding the % CSE in this image:
https://user-images.githubusercontent.com/12488060/171420114-f42a8c64-54ca-476f-a9c8-43b13c6775fe.png
It is actually reversed; the left side is the diff whereas the right side was the base. This means that the change actually makes CSE work in this example :)

@TIHan TIHan merged commit 5772d54 into dotnet:main Jun 1, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jul 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants