Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] Fix - Do not remove CAST nodes on store if the LCL_VAR is address exposed #85734

Merged
merged 8 commits into from
Jul 20, 2023

Conversation

TIHan
Copy link
Contributor

@TIHan TIHan commented May 3, 2023

Description

Fixes:

On both ARM and XARCH, mov instructions get elided for GT_STORE_LCL_VAR when the destination and source registers are the same. This poses a problem when GT_STORE_LCL_VAR is for a NormalizedOnLoad parameter and struct field variable and its op had its CAST node removed - which means sign/zero-extended instructions never get emitted when they are supposed to.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 3, 2023
@ghost ghost assigned TIHan May 3, 2023
@ghost
Copy link

ghost commented May 3, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Fixes:

On both ARM and XARCH, mov instructions get elided for GT_STORE_LCL_VAR when the destination and source registers are the same. This poses a problem when GT_STORE_LCL_VAR is for a NormalizedOnLoad parameter variable and its op had its CAST node removed - which means sign/zero-extended instructions never get emitted when they are supposed to.

I've done many experiments, and at the moment, I think the less risky change is to simply not remove the CAST node from the ASG's op2 if it's trying to assign to a parameter.

Author: TIHan
Assignees: TIHan
Labels:

area-CodeGen-coreclr

Milestone: -

@TIHan TIHan changed the title [JIT] Do not remove CAST nodes on assignment if the LCL_VAR is a parameter. [JIT] Fix - Do not remove CAST nodes on assignment if the LCL_VAR is a parameter. May 3, 2023
{
LclVarDsc* varDsc = lvaGetDesc(effectiveOp1->AsLclVarCommon()->GetLclNum());

// It is not safe to remove the cast for non-NormalizeOnLoad variables or parameters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should mention the "why" - i. e. that this is a quirk/workaround for VN's handling of NOL and that we know this is in fact not a complete fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't that these issues are with VN's handling of NOLs; I could be wrong - it's very hard to determine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this issue isn't with VN's handling, then what is it with? We should understand the problem before we work around it.

Copy link
Contributor

@SingleAccretion SingleAccretion May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well - we don't know about other bugs related to this, but if there are, surely we would want to find out what they are about as well.

FWIW, example from #85382 looks very much like the VN issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should understand the problem before we work around it.

I tried to describe the issue in the description regarding the mov elision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems likely that a peephole in the emitter would be able to get a large portion of the benefit of that subrange assertion check without having to complicate the NOL semantics with how stores are handled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did experiment with adding a peephole to look at redundant movzx instructions that would help certain cases.

@jakobbotsch so that opt, I disabled it and #85382 did pass, but not #85081.

Here is what I am using to test:

DOTNET_TieredCompilation=0
DOTNET_JitStressRegs=8
using System.Runtime.CompilerServices;

public class Program
{
    public static long s_15;
    public static sbyte s_17;
    public static ushort s_21 = 36659;
    public static void Main()
    {
        s_15 = ~1;
        var x = (ushort)0;
        bool vr1 = M40(x);
    }

    public static bool M40(ushort arg0)
    {
        for (int var0 = 0; var0 < 2; var0++)
        {
            arg0 = 65535;
            arg0 &= (ushort)(s_15++ >> s_17);
            arg0 %= s_21;
        }

        System.Console.WriteLine(arg0);

        Test.MainTest();

        return false;
    }
}


public class Test
{
    public class Program
    {
        public static short s_2;

        [MethodImpl(MethodImplOptions.NoInlining)]
        public static void Consume(int x) { }

        [MethodImpl(MethodImplOptions.NoInlining)]
        public static int M8(byte arg0)
        {
            s_2 = 1;
            arg0 = (byte)(-s_2);
            var vr1 = arg0 & arg0;
            Consume(vr1);
            return vr1;
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public static int MainTest()
    {
        var x = (byte)1;
        var result = Test.Program.M8(x);
        if (result != 255)
        {
            return 0;
        }
        Console.WriteLine("Success");
        return 100;
    }
}

A successful run will print:

28876
Success

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I too took a look at the dump for #85382 and the problem (in my interpretation) is that remorphing invalidates previous assumption as made by and acted on in assertion prop (@jakobbotsch - did you mean this too?):

  1. We have NOL = CAST<short>(...).
  2. Local assertion prop takes a dependency on this, does not add cast for a downstream use.
  3. Remorph invalidates the assumption by removing the cast (or, rather making it into a CAST<int> one).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(@jakobbotsch - did you mean this too?):

Yes, except I think the optimization made by local assertion prop makes the model of NOL locals much more complex than it should be (i.e. it means your explanation of NOL locals above is really not the fully story).
Local assertion prop here does not only rely on what it proves by seeing the subrange assertion. It also relies on normalization done by genUnspillRegIfNeeded, GT_LCL_VAR codegen and args homing to make this work.
In other words: the assertion only meaningfully describes the upper bits if it just so happens that the local ended up being in a 4-byte home for the time between the assignment that created it and the use.

It's very subtle and hard to reason about this in the face of interacting optimizations like this one (another issue I dug up with an interacting optimization: #66624). So if we could get most of the benefits of this optimization through an emitter peephole, and simplify morph to always normalize, then I think that would be much preferable to the current state of things.

Copy link
Member

@jakobbotsch jakobbotsch May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, restricting the fix to parameters is not enough, it also needs to pessimize struct fields. The following example fails in the same way:

public static bool M40()
{
    S s = default;
    for (int var0 = 0; var0 < 2; var0++)
    {
        s.U = 65535;
        s.U &= (ushort)(s_15++ >> s_17);
        s.U %= s_21;
    }

    System.Console.WriteLine(s.U);
    return false;
}

public struct S { public ushort U; }

…f why we cannot remove CAST nodes from parameters.
@TIHan
Copy link
Contributor Author

TIHan commented May 4, 2023

/azp run runtime-coreclr jitstress2-jitstressregs

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@TIHan
Copy link
Contributor Author

TIHan commented May 4, 2023

/azp run Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@TIHan
Copy link
Contributor Author

TIHan commented May 5, 2023

@dotnet/jit-contrib this is ready. PTAL @jakobbotsch (even though you already have a little bit already :) )

I removed the NormalizeOnLoad optimization in fgMorphLclVar and I kept being pessimistic about parameters for removing casts on assignment. I also updated the test to include the use of struct fields.

On x64, regressions are:
+3,686 bytes, 534 regressions

When combined with #85780, the regressions are (not including wins):
+1,729 bytes, 297 regressions

We don't have the same optimizations like #85780 for ARM64 though.

@TIHan
Copy link
Contributor Author

TIHan commented May 5, 2023

/azp run runtime-coreclr jitstress2-jitstressregs

@TIHan
Copy link
Contributor Author

TIHan commented May 5, 2023

/azp run Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@TIHan TIHan marked this pull request as draft May 5, 2023 21:35
@TIHan
Copy link
Contributor Author

TIHan commented May 5, 2023

@jakobbotsch - actually, let's wait on this. I think we need to have more discussions on what we are going to do here.

I saw this diff:
image

And it makes me want to be careful how we go about this, on the other hand, it's not good that we have a correctness problem.

@ghost ghost closed this Jun 4, 2023
@ghost
Copy link

ghost commented Jun 4, 2023

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@ghost ghost locked as resolved and limited conversation to collaborators Jul 5, 2023
@TIHan TIHan reopened this Jul 17, 2023
@TIHan TIHan changed the title [JIT] Fix - Do not remove CAST nodes on assignment if the LCL_VAR is a parameter. [JIT] Fix - Do not remove CAST nodes on store if the LCL_VAR is a parameter or struct field Jul 17, 2023
@TIHan
Copy link
Contributor Author

TIHan commented Jul 17, 2023

/azp run runtime-coreclr jitstress2-jitstressregs

@TIHan
Copy link
Contributor Author

TIHan commented Jul 18, 2023

@dotnet/jit-contrib @jakobbotsch this is ready. Failures are unrelated. Diffs - There are regressions, which I don't think can be avoided safely since this is a correctness fix. However, I think we could expand peephole optimizations to cover some of these regressions in the future.

@TIHan TIHan marked this pull request as ready for review July 18, 2023 18:52
@markples
Copy link
Member

@TIHan Was this an existing issue that was just found or fallout from recent CAST optimizations? If the latter, then is the regression here (substantially?) less than the original win. I assume so - just a check on the scope of the changes. Thanks.

Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
@TIHan
Copy link
Contributor Author

TIHan commented Jul 18, 2023

@markples It's not directly caused by any recent CAST optimizations, but those CAST optimizations may have allowed this particular issue around removing CAST around NOLs on STOREs(used to be ASG) to manifest itself.

@TIHan
Copy link
Contributor Author

TIHan commented Jul 19, 2023

/azp run runtime-coreclr jitstress2-jitstressregs

@TIHan
Copy link
Contributor Author

TIHan commented Jul 20, 2023

@dotnet/jit-contrib @jakobbotsch This is ready again, less regressions than before and CI seems to be passing :)

Comment on lines +10122 to +10128
// We can make this transformation only under the assumption that NOL locals are always normalized before they
// are used,
// however this is not always the case: the JIT will utilize subrange assertions for NOL locals to make
// normalization
// assumptions -- see fgMorphLeafLocal. Thus we can only do this for cases where we know for sure that
// subsequent uses
// will normalize, which we can only guarantee when the local is address exposed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatting here is a bit strange, maybe reformat the comment locally?

Comment on lines +1097 to +1103
// NormalizeOnLoad Rules:
// 1. All small locals are actually TYP_INT locals.
// 2. NOL locals are such that not all definitions can be controlled by the compiler and so the upper bits can
// be undefined.For parameters this is the case because of ABI.For struct fields - because of padding.For
// address - exposed locals - because not all stores are direct.
// 3. Hence, all NOL uses(unless proven otherwise) are assumed in morph to have undefined upper bits and
// explicit casts have be inserted to "normalize" them back to conform to IL semantics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this comment is formatted strangely, missing some spaces and it should be "address-exposed", not "address - exposed".

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Feel free to address the comment suggestions as part of a follow-up PR to avoid rerunning CI.

@TIHan TIHan merged commit 42322f5 into dotnet:main Jul 20, 2023
161 checks passed
@TIHan TIHan deleted the 85382-fix-2 branch July 20, 2023 16:22
@TIHan TIHan changed the title [JIT] Fix - Do not remove CAST nodes on store if the LCL_VAR is a parameter or struct field [JIT] Fix - Do not remove CAST nodes on store if the LCL_VAR is address exposed Jul 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants