JIT: improve scalability of optReachable #75990

AndyAyersMS · 2022-09-21T20:01:33Z

Use a bit vector to track the visited blocks. This scales much better than using the per-block visited flags.

Use a bit vector to track the visited blocks. This scales much better than using the per-block visited flags. Fixes dotnet#44341.

ghost · 2022-09-21T20:01:51Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Use a bit vector to track the visited blocks. This scales much better than using the per-block visited flags.

Fixes #44341.

Author:	AndyAyersMS
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

AndyAyersMS · 2022-09-21T20:08:48Z

For the HugeMethod from #44341, the checked jit now finishes about 150x faster than before. Local SPMI shows nice TP wins in general, though I don't have matching MSVC locally, so will see what CI says.

There are some other things we might be able to do here to speed this up even more, eg leveraging the residual bit vector state to accelerate compares starting from the same block or using the somewhat stale precomputed reachability to screen out certain cases (RBO should reduce reachability overall, never increase it). Will hold off on those for now.

@jakobbotsch PTAL
cc @dotnet/jit-contrib

AndyAyersMS · 2022-09-21T22:43:57Z

spmi diffs don't see any TP impact. This is quite surprising.

I'll try a matching set of local builds and see what that says.

AndyAyersMS · 2022-09-22T00:38:22Z

I can repro the NAOT OSX x64 failure, but it is intermittent, the failure symptoms vary, and it is very likely unrelated.

        ===== Running test BasicThreading.Run =====
        Expected: 100
        Actual: 134
        END EXECUTION - FAILED
        Test Harness Exitcode is : 1
        To run the test:
        > set CORE_ROOT=/Users/andy/repos/runtime0/artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root
        > /Users/andy/repos/runtime0/artifacts/tests/coreclr/OSX.x64.Release/nativeaot/SmokeTests/UnitTests/UnitTests/UnitTests.sh
        Expected: True
        Actual:   False
        Stack Trace:
          /Users/andy/repos/runtime0/artifacts/tests/coreclr/OSX.x64.Release/TestWrappers/nativeaot.SmokeTests/nativeaot.SmokeTests.XUnitWrapper.cs(1337,0): at nativeaot_SmokeTests._UnitTests_UnitTests_UnitTests_._UnitTests_UnitTests_UnitTests_sh()
             at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
             at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
        Output:
          Process terminated. Runtime-generated FailFast: (1): Runtime internal error
          /Users/andy/repos/runtime0/src/tests/Common/scripts/nativeaottest.sh: line 14: 21983 Abort trap: 6           $_DebuggerFullPath $1/native/$exename "${@:3}"

@MichalStrehovsky is there a known issue here?

MichalStrehovsky · 2022-09-22T00:42:12Z

@MichalStrehovsky is there a known issue here?

We didn't look at OSX quality for 7.0 since it's unsupported in 7.0. This is probably #73299. I've not seen it in UnitTests, but who knows. We only run the tests so that we don't severely regress mac since we'll likely add it in 8.0.

AndyAyersMS · 2022-09-22T01:27:05Z

spmi diffs don't see any TP impact. This is quite surprising.

I'll try a matching set of local builds and see what that says.

Local builds agree on minimal impact, so I guess the test case from #44341 is a real outlier.

kunalspathak · 2022-09-22T03:57:53Z

src/coreclr/jit/redundantbranchopts.cpp

@@ -1619,11 +1618,13 @@ bool Compiler::optReachable(BasicBlock* const fromBlock, BasicBlock* const toBlo
                return true;
            }

-            if ((succ->bbFlags & BBF_VISITED) != 0)
+            if (BitVecOps::IsMember(optReachableBitVecTraits, optReachableBitVec, succ->bbNum))


We have this exact code in fgRemoveDeadBlocks(). Can we unify them?

That version uses BlockSet, which I'm avoiding here because it is epoch sensitive.

In general, it would be nice to have the ability to represent (sparse) block sets that are independent of bbNum and not fixed capacity so that they can persist for longer stretches or be reused as general scratch sets, but we don't have anything like that yet.

kunalspathak · 2022-09-22T04:00:33Z

I guess the test case from #44341 is a real outlier.

So, are you saying with this change this case still TIMEOUT? I am not surprised. It has 62348 single line if (f) i++; code in that test case.

jakobbotsch

LGTM.
Did you consider rewalking the graph to be able to clear just the marked subgraph?
I suppose it might not really help since pathological cases will always have to walk the majority of the flow graph, so vectorizing the clear like this might be better in practice for the outlier cases.

Another alternative could be storing an integer tag on BasicBlock that would not need any clearing.

src/coreclr/jit/redundantbranchopts.cpp

AndyAyersMS · 2022-09-22T14:54:23Z

I guess the test case from #44341 is a real outlier.

So, are you saying with this change this case still TIMEOUT? I am not surprised. It has 62348 single line if (f) i++; code in that test case.

No, it should be ok now. I was just surprised nothing else saw any significant TP benefit from this.

AndyAyersMS · 2022-09-22T14:58:56Z

outerloop isn't clean, but only has a few failures. Let's make sure the reenabled test runs ok.

AndyAyersMS · 2022-09-22T14:59:06Z

/azp run runtime-coreclr outerloop

azure-pipelines · 2022-09-22T14:59:30Z

Azure Pipelines successfully started running 1 pipeline(s).

JIT: improve scalability of optReachable

7d4dc9f

Use a bit vector to track the visited blocks. This scales much better than using the per-block visited flags. Fixes dotnet#44341.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 21, 2022

ghost assigned AndyAyersMS Sep 21, 2022

AndyAyersMS mentioned this pull request Sep 21, 2022

Redundant Branch Opts Enhancements #48115

Open

kunalspathak reviewed Sep 22, 2022

View reviewed changes

jakobbotsch approved these changes Sep 22, 2022

View reviewed changes

src/coreclr/jit/redundantbranchopts.cpp Show resolved Hide resolved

jakobbotsch mentioned this pull request Sep 22, 2022

Delete GTF_VAR_CAST #74461

Merged

AndyAyersMS merged commit b4ac094 into dotnet:main Sep 22, 2022

ghost locked as resolved and limited conversation to collaborators Oct 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: improve scalability of optReachable #75990

JIT: improve scalability of optReachable #75990

AndyAyersMS commented Sep 21, 2022

ghost commented Sep 21, 2022

AndyAyersMS commented Sep 21, 2022

AndyAyersMS commented Sep 21, 2022

AndyAyersMS commented Sep 22, 2022

MichalStrehovsky commented Sep 22, 2022

AndyAyersMS commented Sep 22, 2022

kunalspathak Sep 22, 2022

AndyAyersMS Sep 22, 2022

kunalspathak commented Sep 22, 2022

jakobbotsch left a comment •

edited

Loading

AndyAyersMS commented Sep 22, 2022

AndyAyersMS commented Sep 22, 2022

AndyAyersMS commented Sep 22, 2022

azure-pipelines bot commented Sep 22, 2022

JIT: improve scalability of optReachable #75990

JIT: improve scalability of optReachable #75990

Conversation

AndyAyersMS commented Sep 21, 2022

ghost commented Sep 21, 2022

AndyAyersMS commented Sep 21, 2022

AndyAyersMS commented Sep 21, 2022

AndyAyersMS commented Sep 22, 2022

MichalStrehovsky commented Sep 22, 2022

AndyAyersMS commented Sep 22, 2022

kunalspathak Sep 22, 2022

Choose a reason for hiding this comment

AndyAyersMS Sep 22, 2022

Choose a reason for hiding this comment

kunalspathak commented Sep 22, 2022

jakobbotsch left a comment • edited Loading

Choose a reason for hiding this comment

AndyAyersMS commented Sep 22, 2022

AndyAyersMS commented Sep 22, 2022

AndyAyersMS commented Sep 22, 2022

azure-pipelines bot commented Sep 22, 2022

jakobbotsch left a comment •

edited

Loading