Can we optimize non-locking RMW atomic operations? #1729

Sonicadvance1 · 2022-05-26T03:34:38Z

Currently we convert all lock RMW ops to acquire-release semantics.

Couple weird things to investigate here

Basic ALU ops without lock
- Non-lock ops get turned in to load + ALU + store
- Can potentially convert in to atomic memory operation without acquire-release semantics.
- Should only generate on ARMv8.1+ if it supports atomic memory ops
- Might need hardware TSO support?
RMW ops that don't imply LOCK but really should, used without LOCK
- CMPXCHG, CMPXCHG8B, CMPXCHG16B, XADD
- These instructions don't imply LOCK prefixes but they are almost universally used with them
- Linux kernel has some optimization where it backpatches lock cmpxchg in to nop cmpxchg on uniprocessors? Citation needed.
- These might be able to be converted to operations with...release? semantics?
- Needs investigation.

The text was updated successfully, but these errors were encountered:

dnadlinger · 2022-05-27T20:47:30Z

Citation needed.

LOCK_PREFIX is defined here

and the patching mechanism is

Sonicadvance1 added the Investigation label May 26, 2022

skmp added this to the 2211 milestone Aug 10, 2022

Provide feedback