JIT: Visit blocks in RPO during LSRA #107927

amanasifkhalid · 2024-09-17T17:03:03Z

Part of #107749. LSRA's currently does a lexical pass over the blocklist to build a visitation order. Since we intend to run block layout after LSRA with #107634, LSRA ideally shouldn't be sensitive to lexical ordering, and since the current logic tries to visit a block's predecessors before the block itself, it seems easier and faster to just use an RPO traversal.

dotnet-policy-service · 2024-09-17T17:03:42Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

This reverts commit 4a9e0ff.

amanasifkhalid · 2024-09-17T22:29:32Z

/azp run runtime-coreclr outerloop, Fuzzlyn

azure-pipelines · 2024-09-17T22:29:56Z

Azure Pipelines successfully started running 2 pipeline(s).

amanasifkhalid · 2024-09-18T01:54:47Z

cc @dotnet/jit-contrib, @AndyAyersMS @kunalspathak PTAL. Fuzzlyn failures are known or NaN false positives.

Note that LSRA previously had three ordering strategies: the default preds-first one, lexical order, and randomized order (which was never implemented). RPO looks like a viable replacement for the first one, and lexical order doesn't make much sense if we plan to move block layout later, so I removed the functionality for specifying LSRA's block order. Is it ok to remove this for now, and add it back in if/when we decide to implement a randomized order?

amanasifkhalid · 2024-09-18T01:59:50Z

Diffs are large, though a net size improvement. Looking at the instructions retired per collection, the larger MinOpts TP regressions are concentrated in collections with relatively few MinOpts methods, so I think the TP impact isn't as bad as it looks.

kunalspathak · 2024-09-18T03:41:32Z

Fuzzlyn linux/arm failures seems to expose some more issues by this change. We should address all of them before merging this PR.

amanasifkhalid · 2024-09-18T14:29:59Z

Fuzzlyn linux/arm failures seems to expose some more issues by this change. We should address all of them before merging this PR.

The assertion varDsc->IsAlwaysAliveInMemory() || ((regSet.GetMaskVars() & regMask) == 0) looks like the one you just fixed, though it doesn't match the assert listed at the bottom, otherTargetInterval->registerType == TYP_DOUBLE -- the latter assert's message references a method that doesn't exist in the trimmed repro. I'm guessing this is a Fuzzlyn bug? I can't reproduce either assert with the provided repro; I'll try rerunning Fuzzlyn and see if we get anything actionable.

amanasifkhalid · 2024-09-18T14:30:18Z

/azp run Fuzzlyn

azure-pipelines · 2024-09-18T14:30:31Z

Azure Pipelines successfully started running 1 pipeline(s).

amanasifkhalid · 2024-09-18T17:01:41Z

@kunalspathak the Linux arm failure didn't repro in the last Fuzzlyn run, so if it is a bug, it doesn't readily repro. Aside from inner/outerloop tests, are there any other suites you'd like me to run? JitStress doesn't seem to do anything interesting for block ordering, though if it's still worthwhile to run LSRA stress modes, I can do that -- thanks!

kunalspathak · 2024-09-18T17:04:03Z

@kunalspathak the Linux arm failure didn't repro in the last Fuzzlyn run, so if it is a bug, it doesn't readily repro. Aside from inner/outerloop tests, are there any other suites you'd like me to run? JitStress doesn't seem to do anything interesting for block ordering, though if it's still worthwhile to run LSRA stress modes, I can do that -- thanks!

Yes, since Fuzzlyn is randomized, it might not necessary repro on every run, but we should take the failure that we saw in previous run and see why it showed up with this changes and go from there. I would usually run *jitstressregs* pipelines too.

amanasifkhalid · 2024-09-18T17:06:01Z

/azp run runtime-coreclr jitstressregs, runtime-coreclr jitstressregs-x86, runtime-coreclr jitstress2-jitstressregs

azure-pipelines · 2024-09-18T17:06:29Z

Azure Pipelines successfully started running 3 pipeline(s).

amanasifkhalid · 2024-09-18T17:24:12Z

Note that the diffs from the latest run look quite different because the collections got a bit messed up on x64: we're missing coreclr_tests and libraries_tests*, and diffs from smoke_tests.nativeaot are included multiple times.

amanasifkhalid · 2024-09-18T20:10:11Z

I haven't been able to repro the Linux arm assert(s) using a crossjit, and based on the inconsistent error message in Fuzzlyn, I suspect the repro was wrong to begin with. jitstressregs pipelines are passing.

kunalspathak · 2024-09-18T20:16:30Z

I haven't been able to repro the Linux arm assert(s) using a crossjit, and based on the inconsistent error message in Fuzzlyn

Did you try downloading the superpmi collection from the prior run and tried reproing on your pre-merge commit? If not, I will find some time to repro it with your changes.

kunalspathak · 2024-09-18T20:17:44Z

You could also try a temporary change as part of your PR to increase the duration of Fuzzlyn to 4 hours and see if it catches anything.

amanasifkhalid · 2024-09-18T21:12:07Z

Did you try downloading the superpmi collection from the prior run and tried reproing on your pre-merge commit?

Thanks for recommending this. I was able to hit both of the above asserts when replaying with a crossjit, though I can repro them with a mainline JIT, too. I've opened #108000 to track these asserts.

amanasifkhalid · 2024-09-19T20:10:41Z

@kunalspathak were there any other pipelines you wanted me to run?

kunalspathak · 2024-09-19T20:38:06Z

@kunalspathak were there any other pipelines you wanted me to run?

i think that should be it. If the failures are in main as well, then I think we should be good. I will review PR and get back to you.

kunalspathak

Compared to the RPO based traversal, the only thing that current traversal was not doing was:

Sequencing the blocks based on profile data
Relying on block weight / bbNum in case of tie-breaker

Both of these can have significant impact on final code we produced as we see in the diffs. While the TP improvement is great (around 3~5% in MinOpts for some platforms), the perfscore is half and half. I see same number of methods improved vs. regressed.

To get some confidence I am tempting to see some history of benchmarks that get improved vs. regressed to make sure that this change is the way to go forward. @AndyAyersMS any thoughts?

Remove lexical and random order layout

I am ok with that.

kunalspathak · 2024-09-20T04:07:06Z

src/coreclr/jit/lsra.cpp

+    // If the DFS didn't visit any blocks, add them to the end of blockSequence
+    if (dfsTree->GetPostOrderCount() < compiler->fgBBcount)
+    {
+        // Unvisited blocks are more likely to be at the back of the list, so iterate backwards


What are the examples where there will be left over unvisited blocks?

Any blocks that don't have FlowEdges into them won't be visited by the RPO traversal. For every example I looked at, this happens when we have throw blocks without any explicit predecessors. For example:

BBnum BBid ref try hnd preds weight [IL range] [jump] [EH region] [flags] --------------------------------------------------------------------------------------------------------------------------------------------------------------------- BB01 [0000] 1 1 [000..00D)-> BB06(0.5),BB02(0.5) ( cond ) i LIR hascall idxlen BB02 [0008] 1 BB01 0.25 [00D..???)-> BB07(0.01),BB03(0.99) ( cond ) LIR internal idxlen BB03 [0010] 2 BB02,BB05 3.96 [00D..01C)-> BB05(0.5),BB04(0.5) ( cond ) i LIR loophead idxlen bwd BB04 [0006] 1 BB03 0.99 [???..???)-> BB05(1) (always) i LIR idxlen BB05 [0004] 2 BB03,BB04 3.96 [01C..026)-> BB03(0.5),BB06(0.5) ( cond ) i LIR hascall idxlen bwd BB06 [0019] 3 BB01,BB05,BB09 1 [026..031) (return) i LIR gcsafe newobj BB07 [0011] 2 BB02,BB09 0.04 [00D..01C)-> BB09(0.5),BB08(0.5) ( cond ) i LIR loophead idxlen bwd BB08 [0013] 1 BB07 0.01 [???..???)-> BB09(1) (always) i LIR idxlen BB09 [0014] 2 BB07,BB08 0.04 [01C..026)-> BB07(0.5),BB06(0.5) ( cond ) i LIR hascall idxlen bwd BB10 [0020] 0 0 [???..???) (throw ) i LIR rare keep internal ---------------------------------------------------------------------------------------------------------------------------------------------------------------------

BB10 would've been removed in previous phases if it weren't for the BBF_DONT_REMOVE flag set on it. StackLevelSetter, which runs right before LSRA, has some logic for removing throw blocks that are no longer needed, even if they're flagged with BBF_DONT_REMOVE. So if we still see these blocks by the time we get to LSRA, I guess they're needed?

Edges to the throw helpers are added during codegen. So they will appear unreached but are needed.

Since such blocks will be cold (I assume), I am wondering if we should add logic to LSRA to quickly assign registers in such blocks, similar to the way we do for MinOpts, to save some TP time. I am not sure if there will be resolution added to the hot blocks because of this decision and if there is a way to add resolution too in cold blocks.

AndyAyersMS · 2024-09-20T15:38:24Z

To get some confidence I am tempting to see some history of benchmarks that get improved vs. regressed to make sure that this change is the way to go forward. @AndyAyersMS any thoughts

You mean setting up a lab experiment?

kunalspathak · 2024-09-20T16:04:46Z

To get some confidence I am tempting to see some history of benchmarks that get improved vs. regressed to make sure that this change is the way to go forward. @AndyAyersMS any thoughts

You mean setting up a lab experiment?

Yes.

amanasifkhalid · 2024-09-20T16:15:24Z

You mean setting up a lab experiment?

Since this change is contained to one PR, would it be easier to merge this, and then wait for the perf triage? Since JIT churn has been pretty low lately, it should be easy to spot regressions from this, and if we aren't happy with the results, we can just revert this PR.

kunalspathak

Lets merge it as-is and monitor the perf issues.

amanasifkhalid · 2024-09-20T17:06:19Z

@kunalspathak thanks for the review!

Lets merge it as-is and monitor the perf issues.

@AndyAyersMS is it alright if we go with this?

AndyAyersMS · 2024-09-20T17:33:24Z

@kunalspathak thanks for the review!

Lets merge it as-is and monitor the perf issues.

@AndyAyersMS is it alright if we go with this?

Yes

AndyAyersMS

I think this is another place where the loop-aware "RPO" would be useful.

kunalspathak · 2024-09-20T17:46:19Z

loop-aware "RPO" would be useful

What is loop-aware "RPO"?

amanasifkhalid · 2024-09-20T19:53:49Z

What is loop-aware "RPO"?

When visiting a block during an RPO traversal, we check if the block is a loop header, and if so, we visit the rest of the loop's body before visiting anything else -- value numbering currently does this, if you want to see what the implementation looks like. This visit ordering has the nice property of keeping loop bodies compact.

amanasifkhalid added 4 commits September 16, 2024 16:05

Visit blocks in RPO during LSRA

35fb3f3

Dead code removal

c86c069

Add unvisited blocks manually

4a9e0ff

Remove LSRA traversal order config

c1c7b0b

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 17, 2024

dotnet-policy-service bot assigned amanasifkhalid Sep 17, 2024

amanasifkhalid mentioned this pull request Sep 17, 2024

JIT: Flowgraph Modernization and Improved Block Layout in .NET 10 #107749

Open

28 tasks

build-analysis bot mentioned this pull request Sep 17, 2024

restarted. Azure DevOps can't recover from restarts. dotnet/dnceng#3879

Open

3 tasks

Revert "Add unvisited blocks manually"

831ee14

This reverts commit 4a9e0ff.

This was referenced Sep 17, 2024

LibraryImportGenerator.Unit.Tests crashing on linux-x64 mono interpreter #100800

Open

[mono][interpreter] Mono interpreter is crashing during System.Data.Odbc.Tests (linux-x64 Release Mono_Interpreter_LibrariesTests) #101370

Open

amanasifkhalid added 2 commits September 17, 2024 16:31

Simplify blockSequence building

288bc37

Add back assert

e18374d

amanasifkhalid marked this pull request as ready for review September 18, 2024 01:45

kunalspathak self-requested a review September 18, 2024 03:39

Merge branch 'main' into lsra-rpo

9a451c0

kunalspathak reviewed Sep 20, 2024

View reviewed changes

kunalspathak approved these changes Sep 20, 2024

View reviewed changes

AndyAyersMS approved these changes Sep 20, 2024

View reviewed changes

amanasifkhalid merged commit ddf8075 into dotnet:main Sep 20, 2024
157 of 160 checks passed

amanasifkhalid deleted the lsra-rpo branch September 20, 2024 18:38

amanasifkhalid mentioned this pull request Sep 20, 2024

JIT: Add loop-aware RPO, and use as LSRA's block sequence #108086

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Visit blocks in RPO during LSRA #107927

JIT: Visit blocks in RPO during LSRA #107927

amanasifkhalid commented Sep 17, 2024 •

edited

Loading

dotnet-policy-service bot commented Sep 17, 2024

amanasifkhalid commented Sep 17, 2024

azure-pipelines bot commented Sep 17, 2024

amanasifkhalid commented Sep 18, 2024

amanasifkhalid commented Sep 18, 2024

kunalspathak commented Sep 18, 2024

amanasifkhalid commented Sep 18, 2024

amanasifkhalid commented Sep 18, 2024

azure-pipelines bot commented Sep 18, 2024

amanasifkhalid commented Sep 18, 2024

kunalspathak commented Sep 18, 2024

amanasifkhalid commented Sep 18, 2024

azure-pipelines bot commented Sep 18, 2024

amanasifkhalid commented Sep 18, 2024

amanasifkhalid commented Sep 18, 2024

kunalspathak commented Sep 18, 2024

kunalspathak commented Sep 18, 2024

amanasifkhalid commented Sep 18, 2024

amanasifkhalid commented Sep 19, 2024

kunalspathak commented Sep 19, 2024

kunalspathak left a comment

kunalspathak Sep 20, 2024

amanasifkhalid Sep 20, 2024

AndyAyersMS Sep 20, 2024

kunalspathak Sep 20, 2024

AndyAyersMS commented Sep 20, 2024

kunalspathak commented Sep 20, 2024

amanasifkhalid commented Sep 20, 2024

kunalspathak left a comment

amanasifkhalid commented Sep 20, 2024

AndyAyersMS commented Sep 20, 2024

AndyAyersMS left a comment

kunalspathak commented Sep 20, 2024 •

edited

Loading

amanasifkhalid commented Sep 20, 2024

JIT: Visit blocks in RPO during LSRA #107927

JIT: Visit blocks in RPO during LSRA #107927

Conversation

amanasifkhalid commented Sep 17, 2024 • edited Loading

dotnet-policy-service bot commented Sep 17, 2024

amanasifkhalid commented Sep 17, 2024

azure-pipelines bot commented Sep 17, 2024

amanasifkhalid commented Sep 18, 2024

amanasifkhalid commented Sep 18, 2024

kunalspathak commented Sep 18, 2024

amanasifkhalid commented Sep 18, 2024

amanasifkhalid commented Sep 18, 2024

azure-pipelines bot commented Sep 18, 2024

amanasifkhalid commented Sep 18, 2024

kunalspathak commented Sep 18, 2024

amanasifkhalid commented Sep 18, 2024

azure-pipelines bot commented Sep 18, 2024

amanasifkhalid commented Sep 18, 2024

amanasifkhalid commented Sep 18, 2024

kunalspathak commented Sep 18, 2024

kunalspathak commented Sep 18, 2024

amanasifkhalid commented Sep 18, 2024

amanasifkhalid commented Sep 19, 2024

kunalspathak commented Sep 19, 2024

kunalspathak left a comment

Choose a reason for hiding this comment

kunalspathak Sep 20, 2024

Choose a reason for hiding this comment

amanasifkhalid Sep 20, 2024

Choose a reason for hiding this comment

AndyAyersMS Sep 20, 2024

Choose a reason for hiding this comment

kunalspathak Sep 20, 2024

Choose a reason for hiding this comment

AndyAyersMS commented Sep 20, 2024

kunalspathak commented Sep 20, 2024

amanasifkhalid commented Sep 20, 2024

kunalspathak left a comment

Choose a reason for hiding this comment

amanasifkhalid commented Sep 20, 2024

AndyAyersMS commented Sep 20, 2024

AndyAyersMS left a comment

Choose a reason for hiding this comment

kunalspathak commented Sep 20, 2024 • edited Loading

amanasifkhalid commented Sep 20, 2024

amanasifkhalid commented Sep 17, 2024 •

edited

Loading

kunalspathak commented Sep 20, 2024 •

edited

Loading