Understanding the need for audio buffers

So I started all my audio coding with the FX and Synthbooks by Will Pirkle. When poking around the interwebs, I found that some experienced coders were dissatisfied with the code and concepts presented within. One particular point was the omition of audio buffers (i.e. everithing is done one sample at a time).

So how is the workflow and what are the benefits of using buffers?
I’ll just type what I think it is. Please correct me if I am wrong:

I create a whole buffer of samples in my oscs, pass the entire buffer to my filters, then to the next component and so on. The benefit seems to be that data fetching from RAM is heavily reduced a variable is reused multiple times.

Is there more to it? Are there unforseen problems arising here? (steppy controls and such…)
Appreciante any input and nudges towards more literature. :slight_smile:

1 Like

You will get steppy parameter changes for plugins when using JUCE, no matter if your internal processing uses buffers or sample-by-sample processing. JUCE does not support the sample accurate parameter change mechanisms available in some plugin formats. (With the exception of MIDI CC’s but that is mostly an instrument plugin feature.) You can of course alleviate the issue by doing internal smoothing in the plugin for the parameters, but you will get the parameter updates from the host only at the audio buffer boundaries.

2 Likes

Not sure what you mean by “a variable is reused multiple times”.

The benefit of processing blocks of samples is that CPUs are good at processing items stored continuous in memory with the same operation (google for SIMD if you want further information) so after having created your buffer in your oscillator you might want to filter it, where each sample would maybe undergo the same mathematical operation. The compiler might not be able to use this processing power of the CPU if your code doesn’t work on continuous buffers but on single samples.

Furthermore when it comes to caching, CPUs usually cache completely lines of memory, so if you are working on continuous buffers, it’s likely that multiple samples will be fetched into the cache at one time, which also speeds up your code. Working on single samples could result in more cache misses.

And at last, if you call your processing callback on a per sample basis instead of a per buffer basis, it might be possible that the function call overhead will become a much bigger part of your code execution time and therefore will slow down your code.

However, these are just some general considerations and they might not be true for all situations. There are cases where per sample processing is really senseful or unavoidable and compilers are quite good at optimizing nowadays, so a lot of overhead might be reduced by the intelligence of your compiler.

1 Like

I meant to say that various variables still lingure in L1 or L2 cache and needn’t be fetched from RAM again.
Anyway thanks for your input, this was quite helpful! :slight_smile:

Buffers are needed when you’re moving data in-and-out of plugins, and to/from audio i/o devices, because there’s a fixed overhead for each callback to move data, so grouping samples into chunks reduces that overhead.

However… the fact that people use buffers so pervasively in their own internal code is probably a mistake/habit (which we’re trying to correct with the SOUL architecture). As mentioned above, for most simple purposes the compiler is perfectly good at vectorising your code, and writing things in a sample-by-sample style is the best approach where possible.

I’d definitely suggest writing your code as sample-by-sample initially, and only refactoring it to use buffers if you actually hit a measurable performance problem and can show that using buffers would help.

7 Likes

Just to clarify: Even for an entire synth voice with multiple oscs, filters and FX and more?

Anyway, I’ll just profile it and test for my specific plugin! :slight_smile:

1 Like

Will told me that the upcoming revision of the FX book (May 2019?) addresses this question in detail and uses a new architecture that doesn’t tie it so closely to his RackAFX system.

2 Likes

Is this also true, if you write your code frame based instead of channel based?

I do see the benefit for modern code like the dsp module, but I think if it involves virtual calls, then the optimiser is out of it’s possibilities, or am I wrong in that thinking?

Sorry, but this statement bangs on the fundaments of my view of the world, maybe I missed the memo… :wink:

Depending on what you’re doing, then it might be even better to write frame-based algorithms in a per-sample style, if for example you can put all the frames into a vector and operate on them all at the same time - that’s the easiest thing for the compiler to convert SIMD.

And no, you probably don’t want to be making a virtual call per sample, but I’m talking about inner loop stuff where you’d want to avoid high-level stuff like polymorphism anyway.

1 Like

Sorry to revive this old thread but now’s the time when this becomes relevant and I still have some questions:

Regarding vectorization / SIMD:

Ok, I get that in let’s say this case

for(auto& x : audiobuffer){
   x *= gain_factor;
}

The compiler has an easy time vectorizing.
However my code looks much more like this

for(auto& x : audiobuffer){
   // ...
   oscillator.update()                             // <- some 100 lines of code
   float output_sample = oscillator.doOscillate(); // <- another 100
   VA_filter.update();                             // <- +100
   VA_filter.doFilter(output_sample);              // <- +100
   // ....
}

Here, I don’t see how the compiler would be able to vectorize data inside my DSP classes across function calls. If it were possible, this would be real magic in my eyes (which compilers are, basically).

So the questions are:

  • Am I writing my sample based processing incorrectly?
  • Are compilers / optimizers really THAT good?
  • Will I benefit from buffer-based DSP processing in such a case?

I am also not a compiler guru, rather the opposite end. But I learned a few things NOT to do:

  • virtual calls: the optimiser can only optimise code, that is known at compile time. So when the optimiser reaches a virtual call, I take it it’s the end of optimisation.
    That’s where the template magic of the dsp module comes into play.
  • functions declared in other translation units (i.e. cpp file) - the compiler doesn’t know the implementation from other TUs, so it cannot optimise. There is the link time optimisation for the rescue, if enabled. But I don’t know, if that is as effective as having the code in the same TU, where it can potentially be inlined (which happens automatically by the compilers choice, not to be confused with the inline statement, which changed its meaning slightly as I understand it)

I think the biggest benefit for buffer based processing is to have a whole block of data siting in the cache memory, which makes it a speedway for the processor, not like the last train, that stops in every village.

But I am very curious of other insights, since there is a lot to learn for me as well…

Am I right in the assertion that this will only apply to calls on baseclass pointers?

base_class_instance.foo();           // sure baseclass call
derived_class_instance.foo();        // sure derived call
pointer_to_base_class_instance->foo()// unknown

So… make the entire project one big-as .cpp file…? (reminds me of my coding style in the old days lol)

Yeah this sounds reasonable. Although given the cache sizes today, the CPU might be able to hold all necessary data in a low level cache even between samples, as nothing else is computed in between.

Not sure I understand, but any call to a virtual function is a lookup at runtime, and the compiler cannot know, if it doesn’t have to deal with a subclasses override at runtime. So it has to treat the method as black box and cannot unroll it or whatever it does to optimise.

I was typing together a really long answer but understood my problem in the process :stuck_out_tongue:

Essentially I declared even the lowest derived functions virtual, although there was no one inheriting them. I fixed my code now. I don’t know if there was a speedup, but it was bad coding style anyway.

Thanks for your answers!