> and you should use those instead of just a sequentially consistent atomic. Ehh...

bonzini · on Feb 20, 2023

The question is what the invariants are around those operations. It is rarely the case that you can get away with simple RMW operations, because they don't guarantee any invariants. Also, sequentially consistent RMW atomic operations don't order with non-sequentially consistent atomics (the exception being the seqcst fence) so it's hard to construct send/receive operations using seqcst atomics—if you can use them, chances are that even relaxed could be enough!

Going deeper into the atomic add example, are you sure that the cache line bouncing will not be an issue? can you perhaps make the code just update something that you already have exclusive access to, and sum multiple values when you do a read (hopefully it's rare, e.g. reading a statistic once a second)? So again the solution could be to use a mutex and split the data so that the mutex is mostly uncontended.

dragontamer · on Feb 20, 2023

> Also, sequentially consistent RMW atomic operations don't order with non-sequentially consistent atomics

So just use sequentially-consistent atomics everywhere, unless otherwise needed.

_No one_ should be itching to touch that acquire/release paradigm unless you really have to. Its grossly more complex, and very few programmers understand it.

Acquire/release exists because its necessary. (Ex: implementation of spinlocks/mutexes). But its a tool no one should feel good about using, its very low level, very subtle, and full of potential traps.

A good acquire/release lock-free algorithm or data-structure is still a PH.D thesis level material these days. Its obscure, uncommon, and difficult to write. Don't do it unless you have to. And if you have to, try all the patterns that have been figured out already before innovating.

> Going deeper into the atomic add example, are you sure that the cache line bouncing will not be an issue?

Do you mean false sharing?

False sharing is a performance issue. Your code will be correct, albeit slower. That's fine. Furthermore, acquire/release doesn't do anything to solve false sharing, you need to change your memory layout so that objects are on different cache lines.

> So again the solution could be to use a mutex and split the data so that the mutex is mostly uncontended.

We're only at "#4" because "#1, #2, and #3 have failed". If you can solve things with a mutex, slap down a mutex and call it done. Don't reach for the more complex tools unless necessary.

zozbot234 · on Feb 20, 2023

Yes, if you can make do with a single atomic-sized object, you can perform any RMW on it with either a CAS loop or a special-cased atomic operation (like add or subtract) and not need any further synchronization. What the GP commenter described as being potentially dangerous and nedding expert knowledge is going in any way beyond that. It's really easy to e.g. trigger ABA problems and other issues without realizing it. So just use a mutex to synchronize access instead.