JIT: handle case where we are cloning adjacent sibling loops #70874

AndyAyersMS · 2022-06-17T06:07:41Z

We can sometimes see adjacent sibling loops (where L1.bottom == L2.head)
and if so, cloning L1 will repurpose L1.bottom and so leave L2 in
an inconsistent state.

Detect this case and give L2 a temporary preheader before cloning L1.
Revoke preheader status of L2 if we then go on to clone L2, as cloning
does not preserve preheaders currently.

Fixes #70569.

ghost · 2022-06-17T06:07:55Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

We can sometimes see adjacent sibling loops (where L1.bottom == L2.head)
and if so, cloning L1 will repurpose L1.bottom and so leave L2 in
an inconsistent state.

Detect this case and give L2 a temporary preheader before cloning L1.
Revoke preheader status of L2 if we then go on to clone L2, as cloning
does not preserve preheaders currently.

Fixes #70569.

Author:	AndyAyersMS
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

AndyAyersMS · 2022-06-17T06:24:55Z

@BruceForstall PTAL
cc @dotnet/jit-contrib

Not the most elegant solution, I'll admit, but the "temporary" preheader may get reinstated later on so the block may end up being useful.

No diffs. The original issue was from random PGO stress in libraries-pgo.

Final local flow graph for the example over in #70596 below, showing both siblings successfully cloned. Note RBO gets rid of the in-loop type test in the two fast loops (but not in the two slow ones, as on the slow path the in-loop type test is not dominated by cloning condition type test; control can also reach the slow loop with a null ref).

AndyAyersMS · 2022-06-17T15:07:31Z

/azp run runtime-coreclr superpmi-diffs

azure-pipelines · 2022-06-17T15:07:44Z

Azure Pipelines successfully started running 1 pipeline(s).

AndyAyersMS · 2022-06-17T15:09:31Z

Installer test hit some kind of timing/permissions issue. Will retry.

  Tests succeeded: D:\a\_work\1\s\artifacts\bin\Microsoft.NET.HostModel.AppHost.Tests\Release\net7.0\Microsoft.NET.HostModel.AppHost.Tests.dll [net7.0|x64]
D:\a\_work\1\s\.packages\microsoft.dotnet.arcade.sdk\7.0.0-beta.22310.1\tools\VSTest.targets(55,5): error MSB3491: Could not write lines to file "D:\a\_work\1\s\artifacts\log\Release\Microsoft.NET.HostModel.ComHost.Tests_net7.0_x64.log". The process cannot access the file 'D:\a\_work\1\s\artifacts\log\Release\Microsoft.NET.HostModel.ComHost.Tests_net7.0_x64.log' because it is being used by another process. [D:\a\_work\1\s\src\installer\tests\Microsoft.NET.HostModel.Tests\Microsoft.NET.HostModel.ComHost.Tests\Microsoft.NET.HostModel.ComHost.Tests.csproj]
##[error].packages\microsoft.dotnet.arcade.sdk\7.0.0-beta.22310.1\tools\VSTest.targets(55,5): error MSB3491: (NETCORE_ENGINEERING_TELEMETRY=Build) Could not write lines to file "D:\a\_work\1\s\artifacts\log\Release\Microsoft.NET.HostModel.ComHost.Tests_net7.0_x64.log". The process cannot access the file 'D:\a\_work\1\s\artifacts\log\Release\Microsoft.NET.HostModel.ComHost.Tests_net7.0_x64.log' because it is being used by another process.
  Running tests: D:\a\_work\1\s\artifacts\bin\Microsoft.NET.HostModel.Bundle.Tests\Release\net7.0\Microsoft.NET.HostModel.Bundle.Tests.dll [net7.0|x64]

BruceForstall

This feels like a hack. Ideally, I'd like to get rid of the 'head' concept; I think it's pretty broken/limiting.

Would it be cleaner to do some kind of pre-pass, say right after loop recognition, to create a new 'head' block (not a pre-header as defined) whenever a 'head' and 'bottom' coincide?

BruceForstall · 2022-06-17T22:09:43Z

src/coreclr/jit/loopcloning.cpp

+            }
+
+            assert(sibling <= optLoopCount);
+            if (optLoopTable[sibling].lpTop != x)


Why not check optLoopTable[sibling].lpHead != b directly?

BruceForstall · 2022-06-17T22:11:16Z

src/coreclr/jit/loopcloning.cpp

+    // If B is the header of a sibling loop, give that loop a preheader
+    // that will serve as X, the join point above the sibling.
+    //
+    if ((b->bbJumpKind == BBJ_COND) && (ambientLoop != BasicBlock::NOT_IN_LOOP))


Why check ambientLoop here? I guess it's so you can walk the sibling list. But can't the same situation exist for top-level adjacent loops, in which case you need to walk the entire loop table looking for loops without a parent.

BruceForstall · 2022-06-17T22:15:50Z

src/coreclr/jit/loopcloning.cpp

+    //
+    if ((loop.lpFlags & LPFLG_HAS_PREHEAD) != 0)
+    {
+        JITDUMP("Clearing preader flag from " FMT_LP " and " FMT_BB "\n", loopInd);


Suggested change

JITDUMP("Clearing preader flag from " FMT_LP " and " FMT_BB "\n", loopInd);

JITDUMP("Clearing preader flag from " FMT_LP " and " FMT_BB "\n", loopInd, h->bbNum);

AndyAyersMS · 2022-06-17T22:34:19Z

Yeah, I can look into doing a prepass instead. With a prepass we may alter the flow graph even if we're not going to clone anything, so it might lead to some diffs.

AndyAyersMS · 2022-06-17T22:37:30Z

Seems like maybe this fits into optCanonicalizeLoopNest so will look to add something there.

We can sometimes see adjacent sibling loops (where L1.bottom == L2.head) and if so, cloning L1 will repurpose L1.bottom and so leave L2 in an inconsistent state. Detect this case during optCanonicalizeLoop, and add an intermediary block to serve as L2's head. Fixes dotnet#70569.

AndyAyersMS · 2022-06-18T03:23:30Z

Reworked this to run as part of optCanonicalizeLoop. Let me know if this seems more appealing.

I saw 4 methods with minor diffs in local SPMI.

AndyAyersMS · 2022-06-19T18:29:33Z

@BruceForstall I changed my approach here. See what you think.

BruceForstall

I like this approach, as it happens during "normal" canonicalization, and establishes a new invariant for the loop table.

Can you add appropriate asserts to Compiler::fgDebugCheckLoopTable() to ensure this invariant?

I think we have an ordering requirement on the loop table, but I don't see that Compiler::fgDebugCheckLoopTable() enforces one. I.e., for two top-level loops l1 and l2, they must be disjoint. If the blocks of l1 precede l2, is l1 guaranteed to appear before (at a lower index) than l2 in the loop table? If so, there's no reason to loop over the entire table; for loop index loopInd, look at loopInd - 1 (or the first preceding top-level loop), and you're done. Something similar should apply for siblings, but I guess we don't have a way to walk backward in the sibling list.

One thing I'm a little worried about is it looks like we allow a child loop to share "top" with its parent:

        static bool lpContains(const unsigned*   blockNumMap,
                               const LoopDsc*    loop,
                               const BasicBlock* top,
                               const BasicBlock* bottom)
        {
            return (blockNumMap[loop->lpTop->bbNum] <= blockNumMap[top->bbNum]) &&
                   (blockNumMap[bottom->bbNum] < blockNumMap[loop->lpBottom->bbNum]);
        }

(<= instead of <).

Do you know if we canonicalize to prevent this?

AndyAyersMS · 2022-06-20T06:00:55Z

If the blocks of l1 precede l2, is l1 guaranteed to appear before (at a lower index) than l2 in the loop table?

I think this will be true right after loop recognition, but I don't think this remains true forever, we may reorder entire ranges of blocks later on based on profile data.

it looks like we allow a child loop to share "top" with its parent:

This is a thing canonicalization fixes: it unshares tops. Until recently, this was the only thing canonicalization did. So, it should only be true for a short time (between registering the loop and canonicalizing the loop).

I've now added a second kind of canonicalization (#70809, unshare top with non-loop backedges) and this PR adds a third kind (unshare bottom with head), and we really should add a fourth kind (unshare entry with non-loop backedges -- which instead I've worked around for now, see #70959). And possibly more: it seems like we could have a mid-entry loop whose entry is also the top of a nested loop. so we might want to unshare top and entry.

An alternative to explore is to do backedge canonicalization so that each loop has one actual backedge (from inside the loop to entry) and this might collapse some of the things we'd currently call nested do-while loops to be a single do-while with multiple exits. And we might want to "describe" the non-loop like things we bail on today, because pretending they're not loops seems to be causing some problems.

I'm a fan of the loop recognition advocated by Havlak though it might be there are more recent approaches that work out better.

All of this comes about because we are now cloning more general things that we still recognize as loops; previously we'd restrict ourselves to cloning only iterable loops and those are more restricted in shape.

BruceForstall · 2022-06-21T18:36:06Z

This is a thing canonicalization fixes: it unshares tops. Until recently, this was the only thing canonicalization did. So, it should only be true for a short time (between registering the loop and canonicalizing the loop).

Seems like we should change fgDebugCheckLoopTable to assert there are no shared tops, since it only gets called once canonicalization has finished.

BruceForstall

This comment still applies:

Can you add appropriate asserts to Compiler::fgDebugCheckLoopTable() to ensure this invariant?

but I'll go ahead and approve. We can add this later (if we remember).

AndyAyersMS · 2022-06-21T19:37:18Z

This comment still applies:

Can you add appropriate asserts to Compiler::fgDebugCheckLoopTable() to ensure this invariant?

but I'll go ahead and approve. We can add this later (if we remember).

Added to 7.0 so should be hard to forget: #71084

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 17, 2022

ghost assigned AndyAyersMS Jun 17, 2022

BruceForstall self-requested a review June 17, 2022 21:55

BruceForstall requested changes Jun 17, 2022

View reviewed changes

ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jun 17, 2022

ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jun 17, 2022

AndyAyersMS force-pushed the FixLoopCloningAdjacentSibling branch from 251b176 to cae78ae Compare June 18, 2022 03:21

BruceForstall reviewed Jun 20, 2022

View reviewed changes

BruceForstall approved these changes Jun 21, 2022

View reviewed changes

AndyAyersMS merged commit 3684aa8 into dotnet:main Jun 21, 2022

AndyAyersMS mentioned this pull request Jun 27, 2022

Loop cloning driven by type tests #65206

Open

ghost locked as resolved and limited conversation to collaborators Jul 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: handle case where we are cloning adjacent sibling loops #70874

JIT: handle case where we are cloning adjacent sibling loops #70874

AndyAyersMS commented Jun 17, 2022

ghost commented Jun 17, 2022

AndyAyersMS commented Jun 17, 2022

AndyAyersMS commented Jun 17, 2022

azure-pipelines bot commented Jun 17, 2022

AndyAyersMS commented Jun 17, 2022

BruceForstall left a comment

BruceForstall Jun 17, 2022

BruceForstall Jun 17, 2022

BruceForstall Jun 17, 2022

AndyAyersMS commented Jun 17, 2022

AndyAyersMS commented Jun 17, 2022

AndyAyersMS commented Jun 18, 2022 •

edited

Loading

AndyAyersMS commented Jun 19, 2022

BruceForstall left a comment

AndyAyersMS commented Jun 20, 2022

BruceForstall commented Jun 21, 2022

BruceForstall left a comment

AndyAyersMS commented Jun 21, 2022

	JITDUMP("Clearing preader flag from " FMT_LP " and " FMT_BB "\n", loopInd);
	JITDUMP("Clearing preader flag from " FMT_LP " and " FMT_BB "\n", loopInd, h->bbNum);

JIT: handle case where we are cloning adjacent sibling loops #70874

JIT: handle case where we are cloning adjacent sibling loops #70874

Conversation

AndyAyersMS commented Jun 17, 2022

ghost commented Jun 17, 2022

AndyAyersMS commented Jun 17, 2022

AndyAyersMS commented Jun 17, 2022

azure-pipelines bot commented Jun 17, 2022

AndyAyersMS commented Jun 17, 2022

BruceForstall left a comment

Choose a reason for hiding this comment

BruceForstall Jun 17, 2022

Choose a reason for hiding this comment

BruceForstall Jun 17, 2022

Choose a reason for hiding this comment

BruceForstall Jun 17, 2022

Choose a reason for hiding this comment

AndyAyersMS commented Jun 17, 2022

AndyAyersMS commented Jun 17, 2022

AndyAyersMS commented Jun 18, 2022 • edited Loading

AndyAyersMS commented Jun 19, 2022

BruceForstall left a comment

Choose a reason for hiding this comment

AndyAyersMS commented Jun 20, 2022

BruceForstall commented Jun 21, 2022

BruceForstall left a comment

Choose a reason for hiding this comment

AndyAyersMS commented Jun 21, 2022

AndyAyersMS commented Jun 18, 2022 •

edited

Loading