Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: improve scalability of optReachable #75990

Merged
merged 1 commit into from
Sep 22, 2022

Conversation

AndyAyersMS
Copy link
Member

Use a bit vector to track the visited blocks. This scales much better than using the per-block visited flags.

Fixes #44341.

Use a bit vector to track the visited blocks. This scales much better than
using the per-block visited flags.

Fixes dotnet#44341.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 21, 2022
@ghost ghost assigned AndyAyersMS Sep 21, 2022
@ghost
Copy link

ghost commented Sep 21, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Use a bit vector to track the visited blocks. This scales much better than using the per-block visited flags.

Fixes #44341.

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member Author

For the HugeMethod from #44341, the checked jit now finishes about 150x faster than before. Local SPMI shows nice TP wins in general, though I don't have matching MSVC locally, so will see what CI says.

There are some other things we might be able to do here to speed this up even more, eg leveraging the residual bit vector state to accelerate compares starting from the same block or using the somewhat stale precomputed reachability to screen out certain cases (RBO should reduce reachability overall, never increase it). Will hold off on those for now.

@jakobbotsch PTAL
cc @dotnet/jit-contrib

@AndyAyersMS
Copy link
Member Author

spmi diffs don't see any TP impact. This is quite surprising.

I'll try a matching set of local builds and see what that says.

@AndyAyersMS
Copy link
Member Author

I can repro the NAOT OSX x64 failure, but it is intermittent, the failure symptoms vary, and it is very likely unrelated.

        ===== Running test BasicThreading.Run =====
        Expected: 100
        Actual: 134
        END EXECUTION - FAILED
        Test Harness Exitcode is : 1
        To run the test:
        > set CORE_ROOT=/Users/andy/repos/runtime0/artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root
        > /Users/andy/repos/runtime0/artifacts/tests/coreclr/OSX.x64.Release/nativeaot/SmokeTests/UnitTests/UnitTests/UnitTests.sh
        Expected: True
        Actual:   False
        Stack Trace:
          /Users/andy/repos/runtime0/artifacts/tests/coreclr/OSX.x64.Release/TestWrappers/nativeaot.SmokeTests/nativeaot.SmokeTests.XUnitWrapper.cs(1337,0): at nativeaot_SmokeTests._UnitTests_UnitTests_UnitTests_._UnitTests_UnitTests_UnitTests_sh()
             at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
             at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
        Output:
          Process terminated. Runtime-generated FailFast: (1): Runtime internal error
          /Users/andy/repos/runtime0/src/tests/Common/scripts/nativeaottest.sh: line 14: 21983 Abort trap: 6           $_DebuggerFullPath $1/native/$exename "${@:3}"

@MichalStrehovsky is there a known issue here?

@MichalStrehovsky
Copy link
Member

@MichalStrehovsky is there a known issue here?

We didn't look at OSX quality for 7.0 since it's unsupported in 7.0. This is probably #73299. I've not seen it in UnitTests, but who knows. We only run the tests so that we don't severely regress mac since we'll likely add it in 8.0.

@AndyAyersMS
Copy link
Member Author

spmi diffs don't see any TP impact. This is quite surprising.

I'll try a matching set of local builds and see what that says.

Local builds agree on minimal impact, so I guess the test case from #44341 is a real outlier.

@@ -1619,11 +1618,13 @@ bool Compiler::optReachable(BasicBlock* const fromBlock, BasicBlock* const toBlo
return true;
}

if ((succ->bbFlags & BBF_VISITED) != 0)
if (BitVecOps::IsMember(optReachableBitVecTraits, optReachableBitVec, succ->bbNum))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this exact code in fgRemoveDeadBlocks(). Can we unify them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That version uses BlockSet, which I'm avoiding here because it is epoch sensitive.

In general, it would be nice to have the ability to represent (sparse) block sets that are independent of bbNum and not fixed capacity so that they can persist for longer stretches or be reused as general scratch sets, but we don't have anything like that yet.

@kunalspathak
Copy link
Member

I guess the test case from #44341 is a real outlier.

So, are you saying with this change this case still TIMEOUT? I am not surprised. It has 62348 single line if (f) i++; code in that test case.

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Did you consider rewalking the graph to be able to clear just the marked subgraph?
I suppose it might not really help since pathological cases will always have to walk the majority of the flow graph, so vectorizing the clear like this might be better in practice for the outlier cases.

Another alternative could be storing an integer tag on BasicBlock that would not need any clearing.

src/coreclr/jit/redundantbranchopts.cpp Show resolved Hide resolved
@AndyAyersMS
Copy link
Member Author

I guess the test case from #44341 is a real outlier.

So, are you saying with this change this case still TIMEOUT? I am not surprised. It has 62348 single line if (f) i++; code in that test case.

No, it should be ok now. I was just surprised nothing else saw any significant TP benefit from this.

@AndyAyersMS
Copy link
Member Author

outerloop isn't clean, but only has a few failures. Let's make sure the reenabled test runs ok.

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AndyAyersMS AndyAyersMS merged commit b4ac094 into dotnet:main Sep 22, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Oct 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test failure: JIT/Regression/JitBlue/DevDiv_255294/DevDiv_255294/DevDiv_255294.sh
4 participants