-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: fix case where RBO leads to an invalid CSE #88159
Conversation
If phi-based RBO bypasses a block with a memory PHI, it is possible for CSE to find invalid memory-based CSEs. An example of this is seen in the attached test case. Ideally perhaps CSE would kill availability of these CSEs at the point where memory can change, but that seems difficult to arrange. Instead, we mark the bypased block as one that will not propagate any incoming CSEs, as the failures we know of require CSEs to flow back through this block. Fixes dotnet#88091.
@dotnet/jit-contrib PTAL Consider this as an interim fix, I would like to get something plausible in now and perhaps find a better fix down the road. Not sure who to best tag for a review, so am going to arbitrarily pick @jakobbotsch, but happy for anone else to weigh in too. Should be fairly minimal diffs, either code size increases from lack of costly CSEs, or code size decreases from lack of cheap CSEs leading to less prolog/epilog code. Small size increase on arm64, smaller decrease on x64. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a reasonable surgical fix to me.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsIf phi-based RBO bypasses a block with a memory PHI, it is possible for CSE to find invalid memory-based CSEs. An example of this is seen in the attached test case. Ideally perhaps CSE would kill availability of these CSEs at the point where memory can change, but that seems difficult to arrange. Instead, we mark the bypased block as one that will not propagate any incoming CSEs, as the failures we know of require CSEs to flow back through this block. Fixes #88091.
|
static int Main() | ||
{ | ||
int result = 0; | ||
try | ||
{ | ||
Problem(data); | ||
result = 100; | ||
} | ||
catch (Exception e) | ||
{ | ||
Console.WriteLine($"Failed: {e.Message}"); | ||
result = -1; | ||
} | ||
return result; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this not Main
(convention has been TestEntryPoint
but it isn't semantically meaningful) and mark it with [Fact]
(using Xunit;
)
Optionally (not recommended here), instead make Problem
a [Theory]
and figure out ClassData
for the input.
Optionally (recommend), make the entire method body Problem(data)
. The infrastructure will handle exceptions and the 100 isn't required at that point (can just return void)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like so?
using XUnit;
class Runtime_88091
{
[Fact] static int Test() => Problem(data);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but (I missed this the first time) also make it (the method -and- class) public.
To verify type "88091" in the test tab of the azdo job for the PR and see if it finds the test. It won't if it isn't public, doesn't have [Fact], etc. There are a few checks in msbuild and the analyzers, but there are incomplete and I'm hoping to improve them so that you get direct feedback instead of the current situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The infrastructure will handle exceptions
As far as I've seen in #88006, it does mark the test as failed but it doesn't print the exception in the log in such case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markples azdo still doesn't show the test has been run. Anything obviously wrong to you? If not I will probably merge to get the fix in and we can sort it out later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if I can fix the exception reporting.
The test is actually running. There seems to be something strange with the azdo test filter where it doesn't show up for a while. I don't know if it is time-based or if it's based on earlier queries. (I also didn't see it but then after looking for "JIT", "Regression", etc., it started showing up. I also see it in the helix log.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1588,6 +1595,29 @@ bool Compiler::optJumpThreadCore(JumpThreadInfo& jti) | |||
} | |||
} | |||
|
|||
// If this is a phi-based threading, and the block we're bypassing has | |||
// a memory phi, and the new successors do not, mark the block with BBF_ALTERED_MEMORY_PHI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing BBF_ALTERED_MEMORY_PHI
was an earlier name of BBF_NO_CSE_IN
?
Which I suppose ties in to my other question: technically this only needs to kill memory CSEs, is that correct? Presumably this is just simpler for the interim fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah., thanks for spotting that.
Yes, it could just kill memory CSEs, but it is actually tricky to figure out if a VN is memory dependent. If we could do that then we could instead kill memory CSEs in blocks with "memory havoc" and fix the bug a bit more surgically.
If phi-based RBO bypasses a block with a memory PHI, it is possible for CSE to find invalid memory-based CSEs. An example of this is seen in the attached test case.
Ideally perhaps CSE would kill availability of these CSEs at the point where memory can change, but that seems difficult to arrange. Instead, we mark the bypased block as one that will not propagate any incoming CSEs, as the failures we know of require CSEs to flow back through this block.
Fixes #88091.