It turns out that on Intel you don't really need any memory barriers. Because the writes are already ordered, unless you are using weird-ass fancy instructions anyway... there's a list in the CPU manual worth checking.
So the consumer is guaranteed to have data written to the fifo visible before the FIFO is written. And then secondly the ring-buffer pointers inside AbstractFIFO area also guaranteed atmoic operations.
The upshot of this is that for the producer thread, the ONLY thing you need to do on the intel platform is prevent compiler reordering around the update to the ring buffer pointer. The output assembly looks like straight code with no special threading measures or instructions.
I've not look at the consumer thread in so much detail, but I believe it's the same story.
ARM however is more crazzzzy and well, std::atomic will make the right choices, but actual release and acquire instructions are required I think. My ARM asm is like my French though ... a bit flaky.