Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we optimize non-locking RMW atomic operations? #1729

Open
Sonicadvance1 opened this issue May 26, 2022 · 1 comment
Open

Can we optimize non-locking RMW atomic operations? #1729

Sonicadvance1 opened this issue May 26, 2022 · 1 comment
Milestone

Comments

@Sonicadvance1
Copy link
Member

Sonicadvance1 commented May 26, 2022

Currently we convert all lock RMW ops to acquire-release semantics.

Couple weird things to investigate here

  1. Basic ALU ops without lock
    • Non-lock ops get turned in to load + ALU + store
    • Can potentially convert in to atomic memory operation without acquire-release semantics.
    • Should only generate on ARMv8.1+ if it supports atomic memory ops
    • Might need hardware TSO support?
  2. RMW ops that don't imply LOCK but really should, used without LOCK
    • CMPXCHG, CMPXCHG8B, CMPXCHG16B, XADD
    • These instructions don't imply LOCK prefixes but they are almost universally used with them
    • Linux kernel has some optimization where it backpatches lock cmpxchg in to nop cmpxchg on uniprocessors? Citation needed.
    • These might be able to be converted to operations with...release? semantics?
    • Needs investigation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🆕 Unschedulled
Development

No branches or pull requests

3 participants