In the Noise Gate example project, the main processing loop looks like this:
for (int j = 0; j < buffer.getNumSamples(); ++j)
float mixedSamples = 0.0f;
for (int i = 0; i < sideChainInput.getNumChannels(); ++i)
mixedSamples += sideChainInput.getReadPointer (i) [j];
mixedSamples /= static_cast<float> (sideChainInput.getNumChannels());
lowPassCoeff = (alphaCopy * lowPassCoeff) + ((1.0f - alphaCopy) * mixedSamples);
if (lowPassCoeff >= thresholdCopy)
sampleCountDown = (int) getSampleRate();
// very in-effective way of doing this
for (int i = 0; i < mainInputOutput.getNumChannels(); ++i)
*mainInputOutput.getWritePointer (i, j) = sampleCountDown > 0 ? *mainInputOutput.getReadPointer (i, j) : 0.0f;
if (sampleCountDown > 0)
That comment towards the bottom there caught me off guard: why is this a very in-effective way of doing this? Is this a memory/cache access performance issue?
In the plugin I’m writing, I have a LinearSmoothedValue which wraps my AudioParameterFloat values to avoid clicks during user interaction. If I iterate through the entire left channel buffer reading from this LinearSmoothedValue, then by the time I switch to the right channel to iterate through it’s buffer, my LinearSmoothedValue has reached it’s target value and I’ll lose the smoothing on the right channel. So it makes sense to me to operate on the same sample in both channels at the same time, like the Noise Gate example is doing. But then I see this comment and now I’m second guessing my approach. What’s the right way to handle this kind of thing?
I think unless you are working with interleaved data it isn’t obvious how you’d be friendlier to the cache. Perhaps the comment is referring to the comparison within the loop … but I think the compiler will move that outside the loop anyway.
So my guess is that if you know that sampleDownCount is large, you could use a SIMD operation to fill the buffer with zeros considerably more quickly, rather than processing a sample at a time?
Hm, ok, thanks for the answer! I take that to mean that there’s nothing inherently wrong or inefficient about iterating first by samples and then by channels as shown here? That makes my smoothed parameter calculation much easier
My comment there was more referring to repeatedly calling
getWritePointer which is unnecessary. Also I think the code should be restructured in a way to calculate beforehand how many samples will be zeroed and then apply this to every channel in bulk (as @jimc mentions).
I think, but haven’t proven, that the the answer is no. It’d be interesting to measure it.
(I went looking for measurements on google … didn’t find any and instead found this quote: “Ray states that non-interleaved files can sound better…”!!!)
Fabian - Isn’t it easier to just call getArrayOfWritePointers() at the start and work with data[chan][sampleNumber] anyway? I suppose you get an assert if you specify an invalid channel with getWritePointer, but you’re still able to crash the thing with an invalid sample number … and getWritePointer sure looks ugly there
Yes that may very well be.
Awesome, thanks guys! This is super helpful.
I updated my project to follow this approach and started doing too much work in the inner loop, drove my CPU usage to 60% in Renoise just with a couple of first order IIR filters . So I might be back with some more optimization questions soon enough heh!
Are you sure that was with optimisations on? i.e., a release build?
That was a debug build, but I think if I’m failing to deliver buffers on
time through just a few first order filters, even with no optimizations,
I’m not setting myself up for a successful plugin haha
Unless it’s the ultimate swiss army knife and delivers everything the user needs in their life out of a single stereo channel … in which case I’d say you were on to something