[quote=“jimc, post:6, topic:21097”]
Is this true now? Last time I looked I got an MFENCE for a default atomic operation which was slow. [/quote]
Yes (you need optimization on, of course). Here you can see an atomic release store is exactly equivalent to a normal store:
https://godbolt.org/g/jRdeJ9
Even though acquire/release actually provides more guarantees than, say, std::memory_order_relaxed.
[quote]Is load/store using atomics with relaxed memory ordering adequate to ensure that the compiler actually writes/reads a variable - or do we need release/acquire semantics to guarantee correct operation here? I can’t find the right bit in the C++ standard right now[/quote]Reading an atomic variable implies a separated transaction (writing to an atomic variable is also a transaction) so yes by inference the compiler is required to read the value each time.
[quote]In which we can see that even with relaxed memory ordering every single value is written to the atomic variable (L2 loop) whereas the compiler optimises like fuck for the second loop writing to a plain int and just shoves a 9 in there… 
[/quote]The compiler is required to do so. If another thread relied on the value r switching (a reasonable assumption) your program might now have a deadlock.
[quote=“jimc, post:7, topic:21097”]
Presumably this is ‘permitted’ but in a reasonably complex program it’s very unlikely that the variable is not written to memory.
However I don’t understand enough to be able to argue why that’s the case…? I’ve seen it happen (and break stuff) in tight loops where variables weren’t being updated but never the disaster scenario you’re talking about here.
Maybe the optimiser just isn’t good enough? 
[/quote]It most likely doesn’t happen, because of the side effects of spilling stackframes (since there are only so many registers on x86) constantly thereby referencing the memory thus being saved due to x86 taking care of everything behind the scenes. It is a totally different scenario on ARM and especially Itanium that has 128 registers!
I summon the apocalypse here because I’ve debugged my fair share of code written by people not considering memory ordering (usually code written on x86, then it gets deployed on all sorts of different processors). It is the hardest, most indeterministic bugs that are almost always impossible to trace to the source, and seeing how the cost of doing it the right way I always advocate loudly for doing so.
The most usual problems are nearly always of this form:
struct a
{
a * first;
a * second;
bool ready;
};
a root;
void do_something(a &);
void thread_a()
{
while(!root.ready) {}
do_something(*root.first);
do_something(*root.second);
}
void thread_b()
{
root.first = new a();
root.second = new a();
root.ready = true;
}
If any reader is not aware, the compiler is free to do this:
void thread_a()
{
a & a1 = *root.first, a2 = *root.second; // << compile-time reordering and/or CPU instruction reordering
while(!root.ready) {}
do_something(a1);
do_something(a2);
}
void thread_b()
{
root.ready = true; // << compile-time reordering and/or CPU instruction reordering
root.first = new a();
root.second = new a();
}
You will need atomics with at least acquire/release semantics for the above not to happen. Using only relaxed memory ordering, you will only ensure the program is not actually undefined behaviour.
edit: See response further below for explanation of how to fix above code.
That code snippet is also why the double-checked locking pattern has been broken in both C++, C and Java until recently. For an example relating to audio parameters? Just consider a delay line where a combination of parameters can alter the delay index. You may say people should “not program like that” - well then, you should at least document it seeing as the language allows the behaviour and precisely defines what can be reasoned about in multithreaded programs. But all bets are off in case of undefined behaviour.
[quote=“jimc, post:8, topic:21097, full:true”]It’s safe to use bare floats (i.e. unlikely to crash) on Intel[/quote]Unfortunately not - for IEEE floats it’s even worse as they have trap-representations that may occur through spilling or tearing!
[quote]It’s a win all round …?[/quote]Not only that, with the aggressive compilers today it is so important to avoid undefined behaviour. I recommend watching this fascinating talks that step through what code-passes and transformations LLVM does when facing undefined behaviour:
It’s especially interesting to note the opening line: Any sort of advanced optimization is indistinguishable from magic. It’s not because compiler-writers like to abuse undefined behaviour, it’s just that it’s near-impossible to distinguish when you’re deep down in optimization passes and have done a dozen of transformation passes.
[quote]Maybe there’s some reason why acquire/release is important for these parameter operations - but I can’t see why it is?
[/quote]See beforementioned example.
