Processing Audio: Sample by Sample or Buffer by Buffer?

Good Sunday Jucers!

I am refactoring my audio engine, and I am in the process of deciding whether to favour sample by sample processing vs. whole buffer processing, and/or if have an hybrid approach between the two.
i.e
inline float output = myEngineBlock->audio(float input)
vs
myEngineBlock->processChannel(float* buffer, long bufferSize)

Below a list of the pros of the two approaches coming off the top of my head: what I would like to do here is to start a discussion to clarify whether some of my points are totally wrong or of little importance, or if I am totally missing other important aspects of the conversation. Please please send me any of your 2 cents (or your million dollar :slight_smile: ).

Sample By Sample approach

  • Parameter modulation come out easier and better, since internal parameters are re-evaluated every sample
  • With this approach you give more room for optimisation to the compiler and CPU cache (this is totally empiric , but I suspect there’s a reason why for example the guys at Cycling74 came out with gen~ working on a sample by sample basis )

Buffer by Buffer approach

  • Enables using optimised operations (FloatVectorOperations, IPP etc) for simpler interconnecting stuff (e.g. mixing output of two internal processing blocks)
  • Makes “Block Switching” easier, since if conditions or virtual function calls can be acceptable on a buffer by buffer basis (not so much on a sample by sample basis)

Thanks for reading!
Michelangelo

I just refactored my plugin a few weeks ago from sample based to buffer based and wow, what a performance difference! For me, I saw a 285% performance gain. That was all I needed to see.

With Max, the reason has more to do with simplicity for the user. It’s much simpler to write sample-by-sample code if you are newer to writing DSP code, which many gen~ users are.

The rest of the audio engine for Max is definitely block-based and that’s unlikely to change any time soon, for the same efficiency reasons as noted above. There was an interesting talk at ADC by Ian Hobbs that is worth a look. He showed a very different way of doing single-sample processing through variadic templating.

ALWAYS use buffer by buffer for JUCE’s AudioBuffer<> classes (unless you don’t care about performance). If your data is interleaved (which is common in image processing) then ALWAYS use sample by sample.

Why? Spacial locality.

tl;dr The CPU can process elements way faster when they’re being processed in a straight line (in simple terms) vs hopping all over in memory.

Thanks for all the answers! I see just a tiny tiny preference for buffer by buffer approach :slight_smile:
In this case though, how do You manage a smooth parameter modulation, since all internal block parameters get dereferenced only once per buffer?

P.S
One doubt I have (take a look at the code below): If all the tick() methods are “forced inline” then the approach below is apparently “sample by sample”, but there is no difference from “buffer by buffer” , since all the code is inlined in a big “buffer by buffer” block working on the root AudioSampleBuffer.
Do you think this assumption is correct? If so, then the approach below is far cleaner and flexible in my opinion, as long as one does not allocate/deallocate, lock etc. inside the blocks.

processBlock (AudioSampleBuffer& buffer, MidiBuffer& midiMessages)
{
int frames = buffer.getNumSamples();
float* pL = buffer.getWritePointer(0);
float valueIn = 0.0f, valueOut=0.0f;

while (frames—)
{
valueIn = *pL;
valueOut = myBlock1->tick(valueIn);// all tick() are forcedinline
valueOut = myBlock1->tick(valueOut);
// some more blocks
valueOut = myBlockN->tick(valueOut);
*pL++ = valueOut;
}
}

Thanks again!!

But if you are jumping around the blocks in the most inner loop you may loose the “spacial locality” that Jonathan described.

Also if tick() is a virtual method (which is most likely the case unless you do variadic template stuff), the performance overhead for the vtable is not trivial.

Ok - I’m starting to get the spacial locality thing - I wonder if there’s a profiler feature or data aggregation in Xcode or visual studio that lets you check how bad is your code in that respect.

As for the virtual table topic, My assumption is that a dsp block that needs to process a single sample has a float tick(float) method (or float tick(void) for synths) that is not virtual and is inlined , following the STK coding style (https://ccrma.stanford.edu/software/stk/)

Last time I checked, STK does use a virtual tick() - the tick() in eg. stk::filter is virtual and the subclasses override it (they just don’t specify the override keyword).

I think this is why there are also block based methods in stk and the single tick() is left in there if performance is not an issue.

If its not virtual, you can’t use polymorphism which is key to write a generic engine unless you do the magical stuff Ian was presenting at ADC.