-
Notifications
You must be signed in to change notification settings - Fork 11.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SchedModel] Missing ReadAdvance for the source (non dest) operand of RMW instructions. #50664
Comments
I have a fix for this (and bug 51318) which I plan to send for review today. |
Presumably this should be picked into the release branch? |
CC'ing Tom. It would be really nice if we could merge this fix into the release branch. It fixes a common pattern in code. It could lead to better instruction schedules (so, possibly better perf); the improvement would not be limited to just mca. P.s.: Our numbers for those patterns are now more in line with what IACA generates. The "odd" mca throughput reported in https://uica.uops.info/ (on the default code snippet) is fixed by this patch. |
The fix does not apply cleanly, could someone backport this and push a branch to their local github fork? |
Merged: 20eced2 |
mentioned in issue #51489 |
Extended Description
This is similar to bug 51318. However, it doesn't only affect ADC/SBB but any RMW arithmetic opcode.
Example:
EAX is written in 1cy. However, the load from RSP can start immediately, and doesn't need to be delayed for 1cy.
This is the expected timeline:
The issue is with the definition of WriteALURMW. it is caused by the absence of a ReadAdvance for the register operand.
According to llvm-mc:
The first 5 MCOperands are for the memory reference 4(%rsp).
The read at index #5 is currently not marked as ReadAfterLd.
Same problem affects ADC/SBB, for which we use an almost identical pattern, except that WriteADCRMW is used instead of WriteALURMW.
For those instructions, we also need to mark with ReadAfterLd the implcit read of EFLAGS.
The text was updated successfully, but these errors were encountered: