Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> and you should use those instead of just a sequentially consistent atomic.

Ehhh... sometimes the best solution to the "bank account parallelism" problem is just:

    atomic_int bobs_bank_account_balance;

    // Thread#1
    bobs_bank_account_balance += 100; // Depositing $100 in a sequentially consistent way.


    // In Thread#2
    bobs_bank_account_balance -= 100; // Withdrawing $100 in a sequentially consistent way.
No reason to bring in acquire vs release barriers or anything more complex. Just... atomically add and atomically subtract as needed. Not all cases are this simple, but many cases are. So you might as well try this and see if it is good enough.

If not, then yeah, you move onto more complex paradigms. But always try the dumb and simple solutions first, before trying the harder stuff.

----------

This case is super common, that its even optimized in GPU programming. I've seen atomics like this become optimized into a prefix-sum routine by the compiler.

Yes, this means you can have thousands of GPU-threads / shaders performing atomic adds / subtracts in GPU-space, and the atomic will be surprisingly efficient.

The problem is that this paradigm doesn't always work. It takes skill to know when paradigms fail or succeed, and its sometimes very subtle. (That's why I say: try this, but... speak with an expert when doing so). There might be a subtle race condition. But in the cases where this works, absolutely program in this way.



The question is what the invariants are around those operations. It is rarely the case that you can get away with simple RMW operations, because they don't guarantee any invariants. Also, sequentially consistent RMW atomic operations don't order with non-sequentially consistent atomics (the exception being the seqcst fence) so it's hard to construct send/receive operations using seqcst atomics—if you can use them, chances are that even relaxed could be enough!

Going deeper into the atomic add example, are you sure that the cache line bouncing will not be an issue? can you perhaps make the code just update something that you already have exclusive access to, and sum multiple values when you do a read (hopefully it's rare, e.g. reading a statistic once a second)? So again the solution could be to use a mutex and split the data so that the mutex is mostly uncontended.


> Also, sequentially consistent RMW atomic operations don't order with non-sequentially consistent atomics

So just use sequentially-consistent atomics everywhere, unless otherwise needed.

_No one_ should be itching to touch that acquire/release paradigm unless you really have to. Its grossly more complex, and very few programmers understand it.

Acquire/release exists because its necessary. (Ex: implementation of spinlocks/mutexes). But its a tool no one should feel good about using, its very low level, very subtle, and full of potential traps.

A good acquire/release lock-free algorithm or data-structure is still a PH.D thesis level material these days. Its obscure, uncommon, and difficult to write. Don't do it unless you have to. And if you have to, try all the patterns that have been figured out already before innovating.

> Going deeper into the atomic add example, are you sure that the cache line bouncing will not be an issue?

Do you mean false sharing?

False sharing is a performance issue. Your code will be correct, albeit slower. That's fine. Furthermore, acquire/release doesn't do anything to solve false sharing, you need to change your memory layout so that objects are on different cache lines.

> So again the solution could be to use a mutex and split the data so that the mutex is mostly uncontended.

We're only at "#4" because "#1, #2, and #3 have failed". If you can solve things with a mutex, slap down a mutex and call it done. Don't reach for the more complex tools unless necessary.


Yes, if you can make do with a single atomic-sized object, you can perform any RMW on it with either a CAS loop or a special-cased atomic operation (like add or subtract) and not need any further synchronization. What the GP commenter described as being potentially dangerous and nedding expert knowledge is going in any way beyond that. It's really easy to e.g. trigger ABA problems and other issues without realizing it. So just use a mutex to synchronize access instead.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: