Block-based processing: what is acceptable?

Hello, this is an open question for plugin developers. Any insight will be much appreciated!

TL;DR What kind of operation/computation is it acceptable (for users) to do on a block basis? How ok is it for plugins to behave a bit differently depending on the block size? What are your strategies and tradeoffs when processing audio in blocks?

I come from hardware development, where we are in total control on the size of audio buffer blocks. The block size N is usually fixed at compile-time, and block processing allows to:

  • amortize the cost of computation of a single sample over N samples,
  • read and process parameter values into “state variables” (e.g. convert a dB value into a gain), effectively making parameters signals that are downsampled by N wrt. audio,
  • apply filtering on parameters/state (e.g. ramping a gain value over a block to avoid clicks)
  • put state variables in registers to optimize the sample-based process() function (no branches, few memory accesses)

After developing a couple of plugins I realized that these strategies would not work, because block size 1. is not in my hands to choose, 2. can even change dynamically, and 3. is vastly variable. For instance, a gain ramp over one block might make a (dull) click for N=16, and might sound too long for N=4096.

I haven’t found a good design pattern to make this two-rate system principle (long, costly block-based operations vs. short, sample-based computation) scale to plugin dev. Is it ok to do block-based operations at block rate, and just write in the manual “if you hear ramps, reduce your block size”? Do you re-pack the blocks given by the host into controlled-sized blocks? What is supposed or can happen between two blocks?

Thanks in advance for your insights!

You might need to make your processing block size independent. Sometimes that’s rather trivial to do, but sometimes you might need to fit the processing into some internal block size instead.

The classic example is using FFT processings, you can’t assume the buffers you get from the DAW host have any relation to the required FFT size, so you need to implement internal buffering in the plugin that gathers, processes and outputs the samples as needed. (Usually some kind of FIFO/ring buffer system is used for that.)

it’s actually mostly easy. you just let all buffers have the maximum block size and only use numSamples each block. and yeah ofc you don’t let your smoothers and stuff depend on numSamples then but just give them the length they need and ignore that it might be more or less than numSamples

As Xenakios said, use a FIFO. And, juce::AbstractFIFO class is your friend here. It mostly abstracts away the details of managing the FIFO (which can be deceptively tricky). So, it makes dealing with different block sizes nearly painless.

what i do is ramp parameters over a fixed time period.
That could be multiple blocks, or a fraction of one block depending on the block-size and sample-rate.
The point is - the ramp should always take the same number of ms, no matter what.
Otherwise a user could get a track rendered differently just by adjusting the soundcard buffer size to cope with a high CPU load for example.

3 Likes

Thanks for your answers!

This is what I am doing already for FFT-based processors (obviously) but hesitated to do for purely time-based processors. On the upside, as you suggest, I’d get a nice process() function with a fixed block size that can even be known at compile-time (which might help some optimizations).

On the downside,

  1. I am introducing more latency: it now can’t be less than the internal block size, and
  2. this latency depends on the external (host) block size, which can be variable, so I’d have to report a plugin latency that changes over time. I don’t have first-hand experience but I read that variable latency is handled badly by a lot of hosts, and should be avoided. I could always report the worst latency possible at prepareToPlay-time, based on maximumExpectedSamplesPerBlock, and introduce a delay when it is actually better, but this seems like a waste.

Am I correct? Any thoughts on these downsides?

This is precisely what I am trying to avoid: at best it introduces more instructions in the hot, sample-based process(), at worst it compiles to a branch (“if the ramp time is over do this, otherwise do that”).

Originally coming from an embedded background myself, I can feel you, but over time I had to learn that it’s just okay to add a branch if one is needed. In practice, one branch more or less just won’t matter on a desktop PC CPU as they are crazy good at stuff like branch prediction on a hardware level. Knowing how to write branch free code, using SIMD etc. is a valuable skill and you should use it wherever it fits right away, but if it doesn’t I personally learned not to bother too much. Just take the stupid simple approach then and run a profiler which will reveal the real hotspots of your code if the performance doesn’t meet your desires. Using this approach for some time now, I cannot remember any time where branches were the cause of performance bottlenecks :wink:

2 Likes

From what I understand it is already the case that block size will affect rendering, since parameters are assumed to be constant during a call to processBlock. Imagine a parameter that is ramped by an automation from 0 to 1 over one second. If block size is 1024 and the parameter is smoothed internally over 128 samples you’ll get (slanted) staircases in your parameter signal; if on the other hand block size is 128 you’ll get a perfectly smooth parameter signal.

Is this correct? Should I stop bothering about this and consider it acceptable?

Hm, that’s the answer I was dreading… I guess I’m not quite ready to give up on my “principles” just yet :smiley:

If the effect of a branch isn’t noticeable in real-world usage of your plug-in then more complicated (and thus harder to reason about or maintain) source code is unlikely to be a good tradeoff.

Everything like this has a cut-off point of course, but there’s a chance your time might be better invested elsewhere.

No, that’s incorrect. Parameters changes are communicated in a sample-accurate fashion in modern plugin APIs like VST3 and AU, AAX, LV2 and CLAP (VST2 excepted). If you action them only on block boundaries, then your changes will suffer semi-random jitter (sometimes later, sometimes earlier), which is a pretty sub-standard user-experience.

Granted we are on the forum for JUCE (which dumbs every plugin API down to something like VST2). So perhaps you are right.

1 Like

Someone wittier than me said something along the lines of “If speed is more important than correctness to you, just delete all your code. It will run in zero time”.