How many audio threads can I create without problems?

When my plugin needs more threads for the audio processing, what are the rules and policies about the number of threads I can create?
How many threads can I create for my plugin without running into problems?
I can check how many cores there are, but the DAW and other plugins also may need threads.
Are there any don’ts and do’s I would need to know?
Is there a best policy for this?

You can’t do much better than query how many cores you have and allocate that many threads. You have no visibility on DAW threads etc or even core/thread usage on the rest of your system.

I advocate for implementing threading in your plugin but make it optional.

But what is best policy here?
If I need the capacity of one complete thread (running solo on a core), I do not I will have a high change to get that when creating one extra thread.
So will it be better policy to create a few and just use them partly?
And if so, is it good practice to use semaphore to wake the thread? I thought on an audio thread you should avoid semaphores.
But what to use else? Keeping a thread alive and polling for an atomic would mean a lot of capacity may be unused.

And if so, is it good practice to use semaphore to wake the thread? I thought on an audio thread you should avoid semaphores.

You may use AsyncUpdater to notify the thread.

I have been using threads to carry out some GUI-related tasks, e.g., calculating FFT. AFAIK using multiple threads to process the audio itself is not common practice (except macOS AudioWorkGroup, but I haven’t played with it yet).

1 Like

If you create the thread, the only code which is going to run on that thread is your code. This is different from CPU cores which the OS schedules across all running threads using thread and process priorities.

If you’re trying to guarantee that your thread is always going to get first dibs on CPU cores, that isn’t your responsibility as a plugin and you shouldn’t try to make it your responsibility.

Just create one thread for each parallel piece of work you will have. If this number is significantly greater than the number of cores on the system, you can use a thread pool with a shared work queue of some kind.

As far as keeping threads alive and waiting, that’s what wait states are for. You wouldn’t want it to sit there churning on an atomic bool. You’d use a wait-able event typically. Then when you queue work you can signal this event to wake the thread.

Btw you should consider using standard c++ std::async and std::future. These will create their own mechanism for queueing the actual work among a pool of threads.

It’s generally ok to use semaphore/mutex if they’re being only managed at the same priority level to avoid priority inversion. For example, don’t have a wait on the audio thread for mutex or semaphore that’s being locked in the message thread for UI etc.

Don’t poll or spinlock in this case. You will fry CPU.

I can describe my approach which worked out well.

First some context. My use case is processing voices on a synthesiser. At least amongst commercial synthesisers it’s not uncommon to provide a multi-threaded option which can result in useful performance improvements.

I create a number of threads that’s equal to the number of cores available.

I have a lock-free realtime safe FIFO work queue that supports multiple readers. I use the excellent farbot library for this (GitHub - hogliux/farbot: FAbian's Realtime Box o' Tricks).

I use a counting semaphore to acquire and release for job execution (GitHub - cyanhill/semaphore: counting_semaphore implementation. This is header-only, no external dependency C++11 library. According to C++20 standard (https://en.cppreference.com/w/cpp/thread/counting_semaphore)).

Two semaphore are used. One that pauses the worker threads, and another to count completed jobs.

My JobDispatcher class has only two methods:

    /** Add a job to the queue. Note: it won't run until join is called */
    void add(std::function<void()> &&job);

    /** Start running all jobs and wait for them to complete */
    void join();

‘add’ simply adds a lambda, which can have captures and so on, to the FIFO job queue. Nothing happens until ‘join’ is called.

All of the threads at this stage are just pausing on the semaphore acquire.

When join is called, I release the number of jobs to the semaphore which starts waking up the threads.

Each thread attempts to take a job (functor/lambda) from the job FIFO queue. This is why multiple reader capability is required.

The thread will stay awake and consume as many jobs as possible. After each job, it calls the semaphore release on the semaphore used for completing jobs. If the FIFO is exhausted then we’re done and the thread loops back to its waiting semaphore acquire.

The main join call meanwhile does an acquire for each job on the job completed semaphore. When it acquires the count for the number of jobs executed then all jobs have completed. All threads will be back to waiting on their acquire to start going at the FIFO again for jobs to do.

So that’s the basic idea. A number of threads waiting to wake up. A FIFO job queue that the threads, when awakened, go at like hungry hippos until there’s nothing left to do.

1 Like

Honestly I’m quite skeptical that techniques like this will provide much performance improvement, since there is simply no way around the need to synchronize all the threads by the end of the processBlock. The only time multithreading might be justified is for “offline” tasks like prerendering samples, reading files from disk, etc.

1 Like

This is the key bit. Any realtime code that relies on multithreading will need to implement failsafes, in case the worker threads miss their deadlines.

Introducing multithreading to audio code will greatly increase the code complexity and make it much more difficult to test rigorously. In my opinion it simply isn’t worth it, but to each their own.

2 Likes

Well yes, but the point is to share the load to different threads/cores so that execution can happen in parallel. For synths - my use case - it’s relevant because you have a number of identical but isolated processing units, i.e. voices. And I’m not talking about some basic PolyBLEP processing, but a lot of IFFT and wave-shaping.

All you want to do is process the voices as fast as possible before the end of processBlock. By default you’d be for-looping through all the voices processing them sequentially. With a consumer threading model you can share the isolated loads out and run in parallel. It’s also possible to use the threading model to run FX lanes in parallel too.

In a recent project I’m involved with the CPU allocation for the synth went from 70%+ to ~20% at peak utilisation (all voices active, lot’s of stuff happening).

As for complexity, there’s some care to take with the JobDispatcher, but in the end my JobDispatcher.cpp is a few pages of code. It’s trivial to select between processing the voices sequentially or enabling parallel processing.

For a single FX like a compressor or reverb then there’s probably not much need for this. For a polyphonic synth, it’s an option and several commercial synths spring to mind that support multi-threading in this way (U-he Repro and Diva, Dune 3…).

1 Like

Yeah, my assumption is OP is talking about background processing. I wouldn’t consider using threading for real-time. The DAW can promise your plugin the resources of a single thread but not a whole worker pool.

I’ve used threading for a background task which runs in response to user input and won’t apply to the real-time side until it is complete.

1 Like

The title of the thread is “How many audio threads can I create without problems”, and I think the best answer is that, it is possible to multithread audio processing, but it does introduce a lot of new problems & complications you must deal with. Not knowing the skill level of the OP, I like to at least assume that a few beginners will be reading the advice here, and for non-experts in realtime programming, I very strongly believe that attempting to multithread audio code will cause more problems than it will solve.

5 Likes

Had a feeling someone would start trotting out credentials. It’s all good nobody is judging, we do what works especially if it makes a product people will use and love.

For what it’s worth, I spent a decade implementing and optimizing an HEVC decoder which ran on NUMA supercomputer clusters and your computer likely has it installed as well. That project involved a from scratch lockfree thread pool and fibers which I designed and implemented by myself. So not a noob either.

1 Like

Thanks all for the reactions.

I understand these are not things that are easy to do. And maybe not suitable for beginners. I did a lot of multi-threaded programming, so in that area I’m not a beginner.

As mentioned above, you do not know how many threads other applications want to use. Including ‘audio’ threads needed by other plugins or the DAW itself.

If my plugin creates a bunch of threads and raises the priority to ‘audio’ priority, how does a DAW deal with that? Suppose there are more plugins doing this. I did read about a DAWs doing some load balancing. But how can it do that. It is the OS that decides which thread will get CPU time.

Will the OS do a good enough job in this multiple audio plugin case?
What is the best practice for us to do in this regard? Creating a number of threads equal to the number of cores and assume the OS does a good job? Even if multiple plugins are doing this?

It is not a matter how hard it is to write in code. You are grasping for a resource, that is already used by the host.

I think best practice is to leave the audio processing on the audio thread that calls your processBlock(). You might give the user the option to run multiple threads, in case the user wants only the maximum performance of your plugin or instrument and doesn’t care if the rest of the system is compromised.
A live set with almost only this one instrument running would be such a situation.

It is hard to say, if the OS can deal with that, as there are many, and for instance the ARM architecture is quite different to the previous architecture.

1 Like

In theory threads should be managed by the DAW and only the DAW.

Imagine your plugin creates num_cores threads and someone adds 8 instances of it to different tracks (or it could be 8 plugins from different vendors all creating those threads). You now have 8x the number of cores available trying to run and (probably the same time). If the DAW is multi-threaded, each of those processBlock calls for the plugin instances will happen concurrently on separate threads.

So now it’s not really possible to start the worker threads from the plugin instances without interrupting one of the threads running another plugin instance.

In reality, threads rarely run exactly like this, audio graphs (which I’ve spoken about in the past) tend to be wider and the bottom and narrower at the top, so if your plugin takes a long time to process, it’s likely some DAW threads will be “paused” as there aren’t any free nodes for it to process and your threads will get some time to run.

So in this situation, you’re really at the mercy of the OS thread schedular and how it deals with multiple threads requesting to run at the same time. I would hope that the real-time threads started by the DAW keep running as long as there are no system calls by that thread (which is likely to be violated by plugins that use multi-threads as they’ll use a semaphore or CV to notify their own threads). However, it is possible to write your code in a way where the calling thread (the one that called processBlock) completes your processing code and any other threads that wake up can help out but aren’t required to complete processing. That’s our strategy in tracktion_graph.

The main point of failure then can be how long it takes to signal your semaphore of CV. This is mostly extremely quick but I have seen it take longer in the “right” circumstances which could be catastrophic if multiple plugins are trying to do this during a single audio callback.


The “correct” way to do this is for plugin APIs to offer an entry point to the DAW where spare thread capacity (i.e. if a thread has run out of nodes to process) can offer its time to the plugin. In theory this requires no signalling (system calls) and there will only ever be num_cores threads.

As far as I know only clap offers this at the moment (maybe VST3?).


In reality, as @Nitsuj70 said, you can get performance improvements rolling your own thread pools but you might end up actually slowing things down in some situations (like multiple plugins all starting lots of threads).

My advice would be to try and optimise your algorithms and use SIMD parallelism before reaching for a thread-pool.

6 Likes

Yes, we need the other plugin standards to support something like CLAP.
But until then we have to deal with it some way. And I want my users to have the choice of using more CPU power for the plugin they purchased.
So I want to add an option to use more threads for the plugin. This makes it the responsibility of the user. And I know that is not a good thing. It should be the responsibility of the DAW.

This is misguided. What kind of audio processing are we even talking about here? We can run hundreds of oscillators going through dozens of filters, including modulation, etc, on a single core without any issues.

What kind of audio processing are you attempting that you think you need multiple threads?

I think that does not matter. It is about the fact that if you want more threads, how to deal with it.

So purely academic? No specific reason? What a waste of everybody’s time.

2 Likes

Reading though this thread has certainly not been a waste of my time. I think several very interesting issues were discussed. Let’s not discourage people from starting up converstions and asking questions.

2 Likes