Well I don’t know if it falls into “over-elaboration” but my thinking was,
since my buffers and children are “static”, so they get constructed at the start and stay the same throughout the whole lifetime of the program without any changing in size etc., there would be no need to set this pointer every time, but instead just add the pointer once (and not even the whole multichannel buffer but only the channel the child cares about) as a member and work from there,
instead of having to get it from the parent.buffer on each audio callback?
Maybe this thinking is wrong / due to a bad design.
I guess ( and like dave and jranglois suggested) it would be better practice to get the pointer again after the clearing, so as close as possible to the actual writing of samples. Makes kind of sense : now i want to write, so i get the write pointer
Maybe I was over-elaborating in trying to “minimise instructions” of getting the same pointer over and over again.
My “hack” was now to get the writePointer before looping over my child-processors (and not doing anything with it in the main loop), so it has the same effect, the buffer flag is correct, it knows I will write something now, but I don’t have to get the pointer in every iteration of my child loop. So maybe that is a compromise?
To get back to the initial topic about using an atomic for the isClear flag. Does anyone know why it was added? The class is used a lot in AudioProcessorGraph and this atomic costs performance, especially if low buffer sizes are used. I can imagine the atomic flag would be required in a multi-threaded AudioProcessorGraph alternative, but we don’t have that. In that case the acq-rel access pattern should be added to the isClear flag to make it hurt less.
Bumping this topic… that atomic is a real performance problem in some cases as it prevents almost every compiler optimization around AudioBuffer calls.
I’m not even sure that whole isClear logic is even needed at all, but I’m pretty convinced it should not be atomic.
These results mean nothing because you’ve concocted a contrived test bench for benchmarking. Your compiler will definitely optimise differently in small situations like this. If you don’t believe me - prove me wrong by reading above.
I suggest getting some profiling data from a mid to large scale application. You can try out the AudioPluginHost and spin up a shit load of plugins to see. Even better - to get a consistent test - you can hack the AudioPluginHost to programmatically, on a timer or something, load plugins you have installed. This way you’d be able to (fairly) easily compare.
Profiling an atomic isn’t as clear cut as that.
It’s very unlikely that the atomic call itself would appear on your profiler.
The way atomics work, they prevent compiler optimizations to everything around them (“memory fence”). So while they add thread safety, their side effect is that “other code” becomes slower.
However since these atomics appear on almost every single call to an AudioBuffer member function, I expect (as you can see from my tests) that you will feel the effects of these memory fence the more you interact with those functions, as the compiler is not allowed to optimize or reorder that section.
Profiling an entire app is not what’s talked about here. An app can be optimized in many ways and by no means am I claiming that this atomic on its own is something that will make your app/plugin “fast” or “slow”. I am showing that it has what I think is a negative impact on the most sensitive performance area, and for something that I feel has no benefit as it doesn’t add any thread safety.
Yes, in this case a profiler isn’t the way to test. Generally a profiler can’t give you A/B between, say -O2 and -Ofast, or between your already heaviest function running 100 or 102 times.
It’s a fantastic tool to let you know where’s your bottle neck. For A/B testing between two internal details like that I think a clean benchmark is a better way to test.
I do agree that I’ve only run my test on one machine, and it would be great if other people run this or other tests on their machines too.
Also, I’d appreciate reading what I’m writing carefully.
I hope you’re not arguing against the fact that an atomic load/store is a performance impact on any code it’s in. The code in question is sensitive code, where I think micro optimization matters, and in some cases will produce measurable results.
Of course, in many other cases the difference might be too small to measure. But I believe in this case this atomic is not adding any thread safety, so I can’t find an argument to keep it there.
I’ve read your posts very clearly, and I hope likewise.
Regardless, I’m arguing that your choice of lack of tooling and arguments ‘for’ a contrived test are poor: I really don’t understand how you can claim that your test is above what a profiler can offer for information, especially without having shown that to be the case.
I’m not claiming “my” test is better than a profiler.
I just suggested that for some cases, like testing the impact of a mutex/atomic on realtime audio a profiler isn’t able to give you correct results, where a benchmark can.
A profiler an amazing tool to know where the performance bottleneck of your app is, which this atomic obviously isn’t - but the impact of it in the code on other code will be measurable, just not with a profiler.
Maybe the only purpose of the atomic flag is if you write in different buffers channels from different threads. If the atomic would be a simple bool, there could be some data racing on this bool.
But that’s quite infrecuent. It should be better using a template parameter where you could specify if you want to use an atomic or a simple bool
purpose of the atomic flag is if you write in different buffers channels from different threads.
but if you write in different channels from different threads, then there are several other possibilities to create data races.
PS: now that i think about it more i wouldn’t remove the atomic, it may can create a lots of issues for people who rely on it, just make it configurable through a template parameter
Thats like you say a quite rare use case and needs many more synchronisation (like the AbstractFifo with atomic read and write pointers), like @chkn just posted while I am typing
IMHO it would be fair to say the isClear synchronisation should then be handled by the caller as well.
My hunch is with @jrlanglois that it doesn’t play a role, since it is per block level, so it will likely be called once in a virtual function (e.g. processBlock()), so the optimisers options are already limited.
Doing the legwork I checked, the atomic wrapper was introduced April 1st 2019 (no joke) by @ed95
(the forum just reminds me that @pflugshaupt posted this already a year ago…)
Maybe @ed95 has some recollection why the atomic was necessary to shorten this debate?
I checked the forum and unfortunately I couldn’t find anything in that time that was pointing to a problem.
This isClear() flag isn’t solving any data race. In fact, if you end up writing to an AudioBuffer from different threads, you’ll run into very serious trouble because the T** in there is not protected by anything.
If this was some mutex/spinlock protecting the data, then it would have been a breaking change to remove it.
I don’t think there are any users that rely on this flag being atomic, and it’s more likely it was inserted to solve some local bug that can be solved in a way that doesn’t introduce overhead to a low level container.