The atomics were a great addition that solved a bunch of problems with the old system.
But! there’s actually a good chance they are adding a major overhead if you’re loading them constantly inside hot loops
If you have code like:
for (int sample = 0; sample < numSamples; ++sample)
buffer[sample] *= gain->load();
The compiler won’t be able to optimize it, due to the nature of an atomic instruction.
Modifying the code to:
auto localGain = gain->load();
for (int sample = 0; sample < numSamples; ++sample)
buffer[sample] *= localGain;
Should allow the compiler to bring back all these optimizations for the now-clean loop.