Multiple threads in JUCE standalone Audio GUI application

I asked questions about multi threading earlier. But that was about a plugin. Now I have a question about multi threading for a standalone JUCE audio GUI application.

I guess getNextAudioBlock() is called in a separate high priority thread. I want to make an application where I need to do a lot of audio processing. I need more threads for this.

  • can I create multiple threads in the prepareToPlay() routine and use them in the getNextAudioBlock()?
    (by creating some thread safe communication between the threads)

  • if so, how can I make sure that these extra threads stay alive and get high priority?

  • should I specify an affinity mask for the threads to make sure others are not using the cores?

  • or is setting a thread to higher priority enough?

  • how many threads does JUCE use for the standalone audio GUI application?

  • how does JUCE make sure the getNextAudioBlock() runs on high priority without interruption?
    (just setting a high priority? or using an affinity mask?)

Any information is welcome, because I try to understand this topic. And I really do not want audio artifacts because the samples could not be calculated on time!

I create X amount of threads when my application starts. As the audio thread is already a realtime thread running on one core, I create one less threads than there are CPU cores in the computer which runs the application. That will ensure each and every CPU core gets its own realtime thread for maximum performance.

I start each thread with:

Thread::startThread(Thread::Priority::highest)

I use std::condition_variable.notify_all() to trigger all my threads to start processing new tasks inside the getNextAudioBlock().

And I use combination of the two lines below to handle safely popping tasks to process from a list:

std::unique_lock<std::mutex> lock(mutex); // Protects the access to list of tasks

std::condition_variable::wait(lock);   // If no tasks were in the list, release the lock and wait until next condition variable notification happens and then reaquire the lock.

Some ideas that may help:

  • You can define the priority of a thread but at the end you are at the mercy of the OS and the hardware of your system.
  • The GUI runs on a single thread (the message thread).
  • ThreadPool is a nice class to run tasks sequentially in a separate thread. In a lot of situations, it is enough to define a ThreadPool to process things (e.g. scan files) in a separate thread from the message and processing threads.

How many samples do you process per getNextAudioBlock() and what is your sample rate?
Do you have any idea how many thread are really running in parallel.
In all kinds of tutorials they advice to never use locking inside your audio processing threads. You don’t have problems with the OS not giving you enough threads?
What OS are you using?

Then what is the best you can do to make the OS let your threads doing their audio jobs?

I process a lot of samples, as my application is a DAW. So there are numerous instruments and effects and they are routed in all sorts of ways. So that’s a lot of processing steps, buffers and routing to do within those threads.

I support all sample rates which the OS supports. Usually the application is running at 44.1 KHz or 96 KHz.

I’m not really concerned about how many threads are running in parallel, as long as I don’t create more than CPU core count minus one. If my audio processing threads are high enough priority, the audio should not start crackling.

The main audio thread is not locked/unlocked ever so that some other thread could steal its time. The only locking happening is when some of the created worker threads can’t get a next task to process right away. Then that thread will wait until the notification comes. During that time the OS can give that CPU core to some other task, which automatically gets interrupted when new tasks become available during the getNextAudioBlock(). My experience is that OS is very efficient and fast in handling such task switching situations, so there’s no problem there. It can easily happen a lot of times while getNextAudioBlock() is doing its thing in parallel.

OS’s I’ve run this on so far are MacOS (Intel Mac) and Windows 10.

Oh, one more detail about my implementation:
When new tasks become available, the threads which perform those tasks always automatically call std::condition_variable.notify_one() to ensure there will be a thread which gets woken up if it’s sleeping so it can work on that task.
The OS handles that really fast, so there’s no need to worry that calling that method unnecessarily would slow things down. It won’t. That’s super fast in practise.

This is very interesting. Thanks a lot!

I always thought that context switching would cost about 1 ms or so. And a sample block of 64 samples at 48KHz only gives you about 1.3 ms processing time. So context switching should be avoided at any cost.
But you think context switching is much much faster than that?

You say “If my audio processing threads are high enough priority …” . How do you make sure that is the case?

Have you determined you actually have a use case where attempting to use additional threads for the audio processing is going to be useful?

I ensure the high priority just by starting the thread with:

Thread::startThread(Thread::Priority::highest)

Here’s an explanation of a really simple and very efficient way of handling the multi-threading in your application/plugin. Works great in JUCE. Full credit for it goes to mystran, who explained it in another forum (link to the post at the end of this message). Here’s the explanation:

==================

Threadpool with one thread per core and a dependency graph of tasks derived from the signal flow.

Each dependency keeps a list of dependees and each dependee keeps a counter of how many dependencies are pending. When a node is processed, do atomic decrement on the counters for the dependees and collect a list of tasks where the counter hits zero. Queue all but one of those into the pool (they can be sorted by priority if you have some heuristic), the remaining one can be processed directly by the task that freed it. The “one task directly” optimization is important, because it means that not only do we try to take advantage of inputs potentially in cache, but also it avoids having to worry about combining simple serial chains into “super tasks” because the overhead of going through the pool again only applies when there are at least two dependees.

This is not the only possible design, but it’s stupidly simple and it works.

==================

Here’s the original post. The whole thread is fairly eye opening and recommended reading for anyone who is interested in using multi-threading:

1 Like