Multi-threading in a plugin

My synth plugin is a little on the CPU heavy side and I’ve been wondering if multithreading could help. And considering I have no experience with multithreading in a plugin, I have some questions.

In my synth, there are 2 main CPU-heavy tasks: generating 64 complex waveforms, and then processing the output. I was thinking about having one thread create the 64 waveforms for the size of the audio buffer and put them into a temporary buffer using an AbstractFIFO, and then have the output processing done by the main audio thread.

So, there would be a PrepWaveforms thread, and the main audio thread would expect that the waveforms have already been created, process the output and put it in the audio buffers.

Is that a reasonable way to go about this?

If so, how would I go about making sure the two threads stay in sync? Meaning, how do I guarantee that the PrepWaveforms thread has completed when I’m in the main audio thread?

Am I thinking about this the right way?

PS - It would be great if there was a JUCE multi-threading demo.

1 Like

If you think like this, doing multiple threads makes no sense. You have one thread preparing the wave forms while the other is waiting for the waveforms. When the waveforms are ready, thread two starts to mix or modulate or doing whatever to combine these. You don’t win time but you lose time for waiting.

Second thought, usually you have a project with many plugins. here it’s the task of the host to optimize that as many plugins can run in parallel.

But you might win some speed, if your synthesizer is the only job running, e.g. as standalone live instrument.

You have jobs which could run in parallel, which is the 64 wave generations. You could run up to 64 threads each one delivering one waveform and having the audio thread combining these waveforms.

But maybe you could try first if you can precalculate these waveforms as lookup tables. Spend some memory and save the hazzle for synchronizing the threads, if possible.

The best solution for multithreading is IMHO avoiding it. However, maybe you can get some inspiration from BufferingAudioSource:

Yeah, I agree…it does sound like my original idea was essentially 2 threads running in series rather than in parallel. I had a feeling that wasn’t practical.

Now this sounds interesting. Each one of those wave generations is essentially self contained and not dependent on anything else, so they could run in parallel. But my audio thread still would need to know when they’re finished before it could process them. How would that work?

And is asking for 64 threads unrealistic in most plugin hosts?

Nothing will stop you asking for and getting 64 threads, but unless you’re running on a > 64-core CPU with a ridiculously large cache, it’d probably be a very bad way to use the machine’s resources.

Perhaps one or two pooled threads to do the background work for all instances of your plugin would be a better use of the CPU?

Funny how ridiculous my question reads after your reply.

Could you explain this in a little more detail, Jules? Wouldn’t my audio thread still need to know (or wait until) when the background work is finished?

jules got it better in less words while I was typing, I’m sending my answer anyway…

You can start as many threads as you want. The OS will have to take care of that all threads get called as parallel as possible. So if you start too many threads, it gives a lot of overhead. Usually as many threads as cpu cores is the best, they say.

So you don’t implement the wave generator as Thread but as TimeSliceClient. Now you can start in the prepareToPlay as many TimeSliceThreads as you want and assign your generators to the threads round robin.

You can either add a flag “ready” in your generator, when the buffer is filled and let the main thread wait until all are ready. But that is bad, because if only one table is not ready, you block the audio thread, which you want to avoid at all cost.
Or you skip the buffers not ready and use an zero buffer instead.

I never tried that in a plugin, only in a standalone app and using 4 threads worked perfect for me.

Does something like this make sense?

class Prep  : public Thread
    void run()
        while (! threadShouldExit())
            if(doit.get() == 1)
                // prepare waveforms

    Atomic<int> doit = 0;

And inside the processor…

void MultiThreadingTestAudioProcessor::processBlock (AudioSampleBuffer& buffer, MidiBuffer& midiMessages)
    const int totalNumInputChannels  = getTotalNumInputChannels();
    const int totalNumOutputChannels = getTotalNumOutputChannels();
    for (int channel = 0; channel < totalNumInputChannels; ++channel)
       // post-process the already created waveforms

    // prepare waveforms for next time

Looks to me like it should work.
You didn’t want to use TimeSliceClient / TimeSliceThread? You cannot choose how many threads you want to instanciate when each Prep instance is a thread.

At that point you are burning CPU:

when it is prepared the loop spins up. You could add an "else sleep (sometime);"
And it’s your job to find a reasonable “sometime”.

and in your “for channel” loop, make sure you don’t read, if prep.doit.get() == 1, otherwise you read and write to the same memory at the same time.

You are correct on all counts. I just wanted to get a simpler proof of concept going before delving into the more complicated (but important) refinements.

Now I’ll replace the prep thread with a more efficient TimeSliceClient/Thread once I understand it better.

Am I right in understanding that I would create a TimeSliceThread (in place of my Prep Thread) and add 64 TimeSliceClients to it, each one prepping one of my waveforms?

And instead of the

if(doit.get() == 1)

…maybe I could just make the TimeSliceThread active at the bottom of processBlock?

A timeSliceThread is responsible of a number of TimeSliceClients. So you shouldn’t use the setActive/InActive method. Have a look at the client’s working callback useTimeSlice()

You can simply return 0 when done and the thread will automatically go to the next client. I don’t know, maybe you can optimize zhat.

Besides synchronizing the Prep thread with the main audio thread, I also have to synchronize the data (waveforms) created as well. I would imagine that I’d have to create a buffer that’s 64 * samplesPerBlock (from prepareToPlay) samples large and fill that each run in Prep.

But then should I transfer all those floats from one thread to the other for processing? Or simply use that same buffer in both threads?

I don’t know what exactly you’re up to. I thought it would make sense that each prep instance has a buffer and the audio thread combines them.
In prepareToPlay():

const int numThreads = 4;
OwnedArray<TimeSliceThread> treads;
for (int i=0; i < numThreads; ++i) {
    threads.add (new TimeSliceThread)->startThread();
OwnedArray<Prep> preps;
for (int i=0; i< 64; ++i) {
    preps.add (new Prep ( (i % numThreads) ) );

in processBlock:

for (int i=0; i<preps.size(); ++i) {
   for (int p=0; p<buffer.getNumSamples(); ++p) {
       buffer = preps.getUnchecked (i).data;  // pseudo code, add, modulate or dcombine the buffers as you like...

and in Prep:

int Prep::useTimeSlice() {
    const float* ptr = data.getWritePointer(0);
    for (int i=0; i<data.getNumSamples(); ++i) {
        ptr[i] = sin(i); // or whatever your wave looks like
    return 0;

…but there are a lot of solutions… good luck

Wow…thanks so much! I’ll be investigating that in the days to come. I’ll let you know my results.

Thanks again.