Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SchedModel] Missing ReadAdvance for the source (non dest) operand of RMW instructions. #50664

Closed
adibiagio opened this issue Aug 3, 2021 · 8 comments
Labels
backend:X86 bugzilla Issues migrated from bugzilla

Comments

@adibiagio
Copy link
Collaborator

Bugzilla Link 51322
Resolution FIXED
Resolved on Oct 11, 2021 20:29
Version trunk
OS Windows NT
Blocks #51489
CC @topperc,@LebedevRI,@RKSimon,@phoebewang,@rotateright,@tstellar
Fixed by commit(s) f0658c7 7a1a35a d8667f1 20eced2

Extended Description

This is similar to bug 51318. However, it doesn't only affect ADC/SBB but any RMW arithmetic opcode.

Example:

add  %eax, %eax
add  %eax, 4(%rsp)

llvm-mca -mcpu=btver2 -iterations=1 -timeline

Timeline view:
Index     0123456789

[0,0]     DeER .   .   addl     %eax, %eax
[0,1]     D=eeeeeeER   addl     %eax, (%rsp)

EAX is written in 1cy. However, the load from RSP can start immediately, and doesn't need to be delayed for 1cy.

This is the expected timeline:

Timeline view:
Index     012345678

[0,0]     DeER .  .   addl      %eax, %eax
[0,1]     DeeeeeeER   addl      %eax, 4(%rsp)

The issue is with the definition of WriteALURMW. it is caused by the absence of a ReadAdvance for the register operand.

According to llvm-mc:

        addl    %eax, 4(%rsp)                   # <MCInst #&#8203;380 ADD32mr
                                        #  <MCOperand Reg:58>
                                        #  <MCOperand Imm:1>
                                        #  <MCOperand Reg:0>
                                        #  <MCOperand Imm:4>
                                        #  <MCOperand Reg:0>
                                        #  <MCOperand Reg:22>>

The first 5 MCOperands are for the memory reference 4(%rsp).
The read at index #​5 is currently not marked as ReadAfterLd.

Same problem affects ADC/SBB, for which we use an almost identical pattern, except that WriteADCRMW is used instead of WriteALURMW.

For those instructions, we also need to mark with ReadAfterLd the implcit read of EFLAGS.

@adibiagio
Copy link
Collaborator Author

I have a fix for this (and bug 51318) which I plan to send for review today.

@adibiagio
Copy link
Collaborator Author

@LebedevRI
Copy link
Member

Presumably this should be picked into the release branch?

@adibiagio
Copy link
Collaborator Author

Presumably this should be picked into the release branch?

CC'ing Tom.

It would be really nice if we could merge this fix into the release branch.

It fixes a common pattern in code. It could lead to better instruction schedules (so, possibly better perf); the improvement would not be limited to just mca.

P.s.: Our numbers for those patterns are now more in line with what IACA generates. The "odd" mca throughput reported in https://uica.uops.info/ (on the default code snippet) is fixed by this patch.

@tstellar
Copy link
Collaborator

tstellar commented Aug 6, 2021

The fix does not apply cleanly, could someone backport this and push a branch to their local github fork?

@adibiagio
Copy link
Collaborator Author

Hi Tom,

This commit depends on another commit which added and modified some tests.

Tests for this fix were added by commit f0658c7

So I think it should work if you:

git cherry-pick -x f0658c7

and then

git cherry-pick -x 7a1a35a

@tstellar
Copy link
Collaborator

Merged: 20eced2

@tstellar
Copy link
Collaborator

mentioned in issue #51489

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

3 participants