AbstractFIFO and Atomics

I had a little play with std::atmoic over the weekend.   I seem to recall that the juce::Atomic calls also implement a full memory barrier.

So as a little experiment I replaced the code in AbstractFIFO with std::atomic using an memory_order_acquire/memory_order_release strategy, and I'm getting 4x the performance out of it.  I need to do some more testing ... and I don't know if it matters in the big picture.  But if anyone does need it to go faster, this looks like a hot avenue ;-)

Very interesting! I don't explicitly add a memory barrier in the Atomic class, but the builtins that it calls may do.

Very much looking forward to replacing all that horrible Atomic stuff with std::atomic one day...

It turns out that on Intel you don't really need any memory barriers.  Because the writes are already ordered, unless you are using weird-ass fancy instructions anyway... there's a list in the CPU manual worth checking.

So the consumer is guaranteed to have data written to the fifo visible before the FIFO is written.  And then secondly the ring-buffer pointers inside AbstractFIFO area also guaranteed atmoic operations.  

The upshot of this is that for the producer thread, the ONLY thing you need to do on the intel platform is prevent compiler reordering around the update to the ring buffer pointer.  The output assembly looks like straight code with no special threading measures or instructions.

I've not look at the consumer thread in so much detail, but I believe it's the same story. 

ARM however is more crazzzzy and well, std::atomic will make the right choices, but actual release and acquire instructions are required I think.  My ARM asm is like my French though ... a bit flaky.