Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: handle case where we are cloning adjacent sibling loops #70874

Merged
merged 1 commit into from
Jun 21, 2022

Conversation

AndyAyersMS
Copy link
Member

We can sometimes see adjacent sibling loops (where L1.bottom == L2.head)
and if so, cloning L1 will repurpose L1.bottom and so leave L2 in
an inconsistent state.

Detect this case and give L2 a temporary preheader before cloning L1.
Revoke preheader status of L2 if we then go on to clone L2, as cloning
does not preserve preheaders currently.

Fixes #70569.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 17, 2022
@ghost ghost assigned AndyAyersMS Jun 17, 2022
@ghost
Copy link

ghost commented Jun 17, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

We can sometimes see adjacent sibling loops (where L1.bottom == L2.head)
and if so, cloning L1 will repurpose L1.bottom and so leave L2 in
an inconsistent state.

Detect this case and give L2 a temporary preheader before cloning L1.
Revoke preheader status of L2 if we then go on to clone L2, as cloning
does not preserve preheaders currently.

Fixes #70569.

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member Author

@BruceForstall PTAL
cc @dotnet/jit-contrib

Not the most elegant solution, I'll admit, but the "temporary" preheader may get reinstated later on so the block may end up being useful.

No diffs. The original issue was from random PGO stress in libraries-pgo.

Final local flow graph for the example over in #70596 below, showing both siblings successfully cloned. Note RBO gets rid of the in-loop type test in the two fast loops (but not in the two slow ones, as on the slow path the in-loop type test is not dominated by cloning condition type test; control can also reach the slow loop with a null ref).

image

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr superpmi-diffs

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AndyAyersMS
Copy link
Member Author

Installer test hit some kind of timing/permissions issue. Will retry.

  Tests succeeded: D:\a\_work\1\s\artifacts\bin\Microsoft.NET.HostModel.AppHost.Tests\Release\net7.0\Microsoft.NET.HostModel.AppHost.Tests.dll [net7.0|x64]
D:\a\_work\1\s\.packages\microsoft.dotnet.arcade.sdk\7.0.0-beta.22310.1\tools\VSTest.targets(55,5): error MSB3491: Could not write lines to file "D:\a\_work\1\s\artifacts\log\Release\Microsoft.NET.HostModel.ComHost.Tests_net7.0_x64.log". The process cannot access the file 'D:\a\_work\1\s\artifacts\log\Release\Microsoft.NET.HostModel.ComHost.Tests_net7.0_x64.log' because it is being used by another process. [D:\a\_work\1\s\src\installer\tests\Microsoft.NET.HostModel.Tests\Microsoft.NET.HostModel.ComHost.Tests\Microsoft.NET.HostModel.ComHost.Tests.csproj]
##[error].packages\microsoft.dotnet.arcade.sdk\7.0.0-beta.22310.1\tools\VSTest.targets(55,5): error MSB3491: (NETCORE_ENGINEERING_TELEMETRY=Build) Could not write lines to file "D:\a\_work\1\s\artifacts\log\Release\Microsoft.NET.HostModel.ComHost.Tests_net7.0_x64.log". The process cannot access the file 'D:\a\_work\1\s\artifacts\log\Release\Microsoft.NET.HostModel.ComHost.Tests_net7.0_x64.log' because it is being used by another process.
  Running tests: D:\a\_work\1\s\artifacts\bin\Microsoft.NET.HostModel.Bundle.Tests\Release\net7.0\Microsoft.NET.HostModel.Bundle.Tests.dll [net7.0|x64]

@BruceForstall BruceForstall self-requested a review June 17, 2022 21:55
Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a hack. Ideally, I'd like to get rid of the 'head' concept; I think it's pretty broken/limiting.

Would it be cleaner to do some kind of pre-pass, say right after loop recognition, to create a new 'head' block (not a pre-header as defined) whenever a 'head' and 'bottom' coincide?

}

assert(sibling <= optLoopCount);
if (optLoopTable[sibling].lpTop != x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not check optLoopTable[sibling].lpHead != b directly?

// If B is the header of a sibling loop, give that loop a preheader
// that will serve as X, the join point above the sibling.
//
if ((b->bbJumpKind == BBJ_COND) && (ambientLoop != BasicBlock::NOT_IN_LOOP))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why check ambientLoop here? I guess it's so you can walk the sibling list. But can't the same situation exist for top-level adjacent loops, in which case you need to walk the entire loop table looking for loops without a parent.

//
if ((loop.lpFlags & LPFLG_HAS_PREHEAD) != 0)
{
JITDUMP("Clearing preader flag from " FMT_LP " and " FMT_BB "\n", loopInd);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
JITDUMP("Clearing preader flag from " FMT_LP " and " FMT_BB "\n", loopInd);
JITDUMP("Clearing preader flag from " FMT_LP " and " FMT_BB "\n", loopInd, h->bbNum);

@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jun 17, 2022
@AndyAyersMS
Copy link
Member Author

Yeah, I can look into doing a prepass instead. With a prepass we may alter the flow graph even if we're not going to clone anything, so it might lead to some diffs.

@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jun 17, 2022
@AndyAyersMS
Copy link
Member Author

Seems like maybe this fits into optCanonicalizeLoopNest so will look to add something there.

We can sometimes see adjacent sibling loops (where L1.bottom == L2.head)
and if so, cloning L1 will repurpose L1.bottom and so leave L2 in
an inconsistent state.

Detect this case during optCanonicalizeLoop, and add an intermediary
block to serve as L2's head.

Fixes dotnet#70569.
@AndyAyersMS AndyAyersMS force-pushed the FixLoopCloningAdjacentSibling branch from 251b176 to cae78ae Compare June 18, 2022 03:21
@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Jun 18, 2022

Reworked this to run as part of optCanonicalizeLoop. Let me know if this seems more appealing.

I saw 4 methods with minor diffs in local SPMI.

@AndyAyersMS
Copy link
Member Author

@BruceForstall I changed my approach here. See what you think.

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach, as it happens during "normal" canonicalization, and establishes a new invariant for the loop table.

Can you add appropriate asserts to Compiler::fgDebugCheckLoopTable() to ensure this invariant?

I think we have an ordering requirement on the loop table, but I don't see that Compiler::fgDebugCheckLoopTable() enforces one. I.e., for two top-level loops l1 and l2, they must be disjoint. If the blocks of l1 precede l2, is l1 guaranteed to appear before (at a lower index) than l2 in the loop table? If so, there's no reason to loop over the entire table; for loop index loopInd, look at loopInd - 1 (or the first preceding top-level loop), and you're done. Something similar should apply for siblings, but I guess we don't have a way to walk backward in the sibling list.

One thing I'm a little worried about is it looks like we allow a child loop to share "top" with its parent:

        static bool lpContains(const unsigned*   blockNumMap,
                               const LoopDsc*    loop,
                               const BasicBlock* top,
                               const BasicBlock* bottom)
        {
            return (blockNumMap[loop->lpTop->bbNum] <= blockNumMap[top->bbNum]) &&
                   (blockNumMap[bottom->bbNum] < blockNumMap[loop->lpBottom->bbNum]);
        }

(<= instead of <).

Do you know if we canonicalize to prevent this?

@AndyAyersMS
Copy link
Member Author

If the blocks of l1 precede l2, is l1 guaranteed to appear before (at a lower index) than l2 in the loop table?

I think this will be true right after loop recognition, but I don't think this remains true forever, we may reorder entire ranges of blocks later on based on profile data.

it looks like we allow a child loop to share "top" with its parent:

This is a thing canonicalization fixes: it unshares tops. Until recently, this was the only thing canonicalization did. So, it should only be true for a short time (between registering the loop and canonicalizing the loop).

I've now added a second kind of canonicalization (#70809, unshare top with non-loop backedges) and this PR adds a third kind (unshare bottom with head), and we really should add a fourth kind (unshare entry with non-loop backedges -- which instead I've worked around for now, see #70959). And possibly more: it seems like we could have a mid-entry loop whose entry is also the top of a nested loop. so we might want to unshare top and entry.

An alternative to explore is to do backedge canonicalization so that each loop has one actual backedge (from inside the loop to entry) and this might collapse some of the things we'd currently call nested do-while loops to be a single do-while with multiple exits. And we might want to "describe" the non-loop like things we bail on today, because pretending they're not loops seems to be causing some problems.

I'm a fan of the loop recognition advocated by Havlak though it might be there are more recent approaches that work out better.

All of this comes about because we are now cloning more general things that we still recognize as loops; previously we'd restrict ourselves to cloning only iterable loops and those are more restricted in shape.

@BruceForstall
Copy link
Member

This is a thing canonicalization fixes: it unshares tops. Until recently, this was the only thing canonicalization did. So, it should only be true for a short time (between registering the loop and canonicalizing the loop).

Seems like we should change fgDebugCheckLoopTable to assert there are no shared tops, since it only gets called once canonicalization has finished.

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment still applies:

Can you add appropriate asserts to Compiler::fgDebugCheckLoopTable() to ensure this invariant?

but I'll go ahead and approve. We can add this later (if we remember).

@AndyAyersMS
Copy link
Member Author

This comment still applies:

Can you add appropriate asserts to Compiler::fgDebugCheckLoopTable() to ensure this invariant?

but I'll go ahead and approve. We can add this later (if we remember).

Added to 7.0 so should be hard to forget: #71084

@AndyAyersMS AndyAyersMS merged commit 3684aa8 into dotnet:main Jun 21, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jul 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
2 participants