How to do double buffered, block based processing?

Hi, I am wanting to create a plugin that processes audio very much like one would on an embedded system using DMA for block based processing.

The idea is that I would like to do my audio processing outside of the processBlock function.
The reason being so that I have the max amount of time to process my audio.

For example, say my block size is 64 from the host.
But for my plugin, I need at least 256 samples to process, so the latency is always 256 samples.

I want to process those 256 samples only when they become available to me. But if I do that processing inside of processBlock(), then I only have the amount of time it takes to receive another 64 samples for my processing, where in reality, I should have the time it takes for 256 samples.

Is there a way to maybe trigger a high priority callback function or thread to be responsible for the signal processing?
That way, all processBlock() needs to do is re-assign pointers to the buffers it should use for input/output after every 256 samples are received (ping-pong block processing).

Hopefully that makes some sense. Thank you in advanced!


I may misunderstood you, but why would you want to process the samples in blocks of 256 when your host makes 64 samples callbacks? Are you doing some FFT where you need minimum 256 samples for higher resolution?

Unless you’ve already tested it and have high CPU usage, this looks like premature optimization. If you didn’t just test it, because processing audio on external threads and dealing with them is a pain in the ass. I don’t think that on any modern CPU (even older ones) doing audio processing in small blocks of 64 will be a problem. For instance, a polyphonic synth with 8 voices and a buffer size of 64 is rendering 8 * 64 = 512 samples every block of 64 samples. Now add some filters, and you are processing even more samples every block of 64, yet there are many polyphonic synths running on old CPUs without any problem.

Just my 2 cents, people may help you better if you tell us what you are trying to process.

hm, I see. I think you’re right. I just come from an embedded background (first time diving into plugins with insanely fast CPUs), so I tend to want to optimize everything as much as possible and handle the data structures appropriately.

Also, even if it is an over-optimization, I’d be interested to know how it can be done with Juce.
But for now I’ll take your advice and try to do everything in the 64 sample time block.

The application will be doing some pitch detection and pitch shifting btw. No FFTs, wavelets are faster :slight_smile:
Just need a larger number of samples to get the pitch accurately.

Thank you for your input!!

Well, if you need resolution for pitch detection it’s not an optimization, it’s a requirement. You can use a circular buffer -write your input, process if there’s at least 256 new samples in the buffer, output from 256 samples behind the input whether you processed or not. Yes, every time you do actual processing the callback will use more CPU than every other time. That shouldn’t be a problem -it should compensate on average, and if it doesn’t, the CPU is just not powerful enough for the task at that buffer size. I’m pretty sure launching another thread would perform worse, not better.

Another perspective on that: The DMA & double buffering you describe is very likely already happening in the Kernel, handled by your audio hardware driver. When building a plugin your processing code is already running into the stage you want to build. Then there is your DAW as an additional layer in between that might decide to do some extra buffering or splitting up buffers into smaller chunks (down to 1 sample in theory) to guarantee sample accurate automation.

Having started with embedded hardware, I can totally understand your motivation, but from my experience in building audio plugins, I‘d really advice you to not over engineer your solution before you run into an actual problem. One golden rule is to keep system calls out of your processing code, which includes memory allocation, file access and locks and you are very likely already on a good route.

Having built a zero delay convolution reverb engine this year, that accepts to process very inefficient under some circumstances in order to stay down at zero samples delay I was very surprised of the totally acceptable CPU impact even at the worst buffer sizes I could make up for testing. And having run a profiler on the code during development, I‘ve been surprised again and again that the parts I spent most time thinking about optimization not being the ones with the biggest performance impact at all :smile:

1 Like

To add to what has been said:

In the plugin world, the host may send buffers of arbitrary size at any time.

Your code must account for the fact that the buffer size may vary from one block to the next! So, you are in the position of simply dealing with whatever the host provides at that instant.

If you do require buffers, then make sure to allocate an ample size to accommodate any expected buffer size, and do so only in the prepareToPlay() function.