FR: Thread-Priority vs Efficiency/Performance Cores

I think the advantage of workgroups (at least the variant we use) is having your aux threads scheduled to be woken up around the same time as the audio thread.

I think @anthony-nicholls is correct regarding wait(-1). I would also make sure to notify() your threads from the audio thread.

I think it should help in workloads that look like this:

void audioCallback (...)
{
    // notifyThreads/dispatchWork() etc

    // do things on the audio thread

    // wait for threads to finish
}

Yes, this is pattern I’ve been using to process multiple identical job callbacks in parallel (generally the tasks process time is identical as it’s multiple voices processing the same signal chain in my application).
After notifying/waking threads, they each pull ‘jobs’ (voice number) from an atomic job count down, call the same call back with voice index, and if they completed and no further jobs available they sit in wait(-1).
I’m waiting for the jobs list to be finished with a juce::WaitableEvent.

Note that if you are doing this you can process one (or multiple) of those jobs on the actual audio callback (something I failed to implement at first) as well as your separately created thread pool instead of forcing the audio callback thread to just sit around waiting :slight_smile: - so the first ‘job’ is pulled off the list before notifying the other audio worker threads to wake.
I’ve not done any serious investigation on the max threads it’s best to use, but I think it makes sense to try to limit it to the number of system reported CPU cores (SystemStats::getNumCpus()).
Any time the audio configuration changes, I’m killing all audio worker threads and recreating them to make sure they use the correct ‘max process time in ms’ as used in ‘RealtimeOptions’ (+ I do this any time the (max) voice count changes in my synth)… (I’m still based on 7.0.3 - so this may be slightly different now with the new AudioWorkgroups support.)

2 Likes

Note the RealtimeOptions had a bit of a shake up recently so you may need to change your code a little when you pull. However I’m not sure realtime threads were actually working back then :grimacing:. From memory everything will appear to work, despite some claims in the docs that you can’t join a workgroup if the thread hasn’t been upgraded to realtime, our experience was that it just works anyway. The result being when you pull the latest version of JUCE you may have some breaking changes but you might see a performance boost too :crossed_fingers:

1 Like

This is how I handle job dispatch too. I am still looking for a nice way to notify the audio thread when all the threads are finished.

Windows has WaitForMultipleObjects which is quite nice as it allows you to wait for all handles with a timeout.

I wonder if kqueue is fast enough for something like this on Apple platforms?

Well, on the basis that you know how many jobs there are at the start, just have an atomic counter that starts at the number of jobs to do and decreases after each job finishes.

Each thread just keeps taking from the job queue until there are no more jobs left (the atomic counter is zero). When there are no jobs left the thread goes back to wait().

Rather than spinning on the main audio thread you may as well put that to work running jobs too using the same mechanism. But once this thread sees that the atomic counter is zero then you know you’re all done.

This is exactly what I do, but if the last job is taken by another thread other than the main audio callback, rather than sitting in a while loop checking the jobsRemaining atomic, I used a juce::WaitableEvent notification for when that last job got completed (using wait function for the tasks owner (the main audio callback)).
I can’t remember exactly why this would be better than just constantly checking the jobsRemaining atomic - I followed somebody else’s example in this case, but it’s been working well for several years now - so didn’t think too much about it :slight_smile: - suggestions to improve welcome.
(Edit - the WaitableEvent wait function suspends the calling thread, until it receives notification (or timeout if set) - so I suppose it at least can save a few CPU cycles… but I don’t know if that is costly or not vs constantly checking for an atomic, or if the OS especially on Mac silicon might not like seeing the main audio workgroup thread getting briefly suspended. Expert comment on this needed.)

Cycling on a jobs remaining atomic is akin to spin locking. But sure, you don’t want to be doing a lot of busy waiting like that.

In my implementation I use a counting semaphore for waking threads up to do work and another one for signalling that work is done.

When a job is executed I do:

    jobCompletionSemaphore_.release();

And in my join method:

    runJobs();
    
    for (int i = 0; i < numJobs_; ++i)
    {
        jobCompletionSemaphore_.acquire();
    }

Where runJobs() goes at a lock/block free fifo of jobs until there’s none left. The same method is used by the threads that wake up so they’re all going at the job fifo. IIRC, I used to have the atomic variable spin loop I mentioned instead of the ‘jobCompletionSemaphore’ but I didn’t think there was much in it performance wise. Of course, YMMV.

Hi, are there any changes to our code needed that just uses the single audio thread created by Juce?

thx

I think you got an answer in the other post. That being said when you say “single audio thread created by Juce” what are you referring to? do you mean when you use a CoreAudio device? if so then I don’t think JUCE creates any threads, that should all be down to CoreAudio.

1 Like

yeah, thx. i guess I was asking if there’s anything need to do overall? I’m getting some issues with glitching from certain clients on M1 where cpu is massively spiking - not something I can reproduce myself.

We are currently trying to get audio workgroups running properly. When the Standalone or the Host (Logic Pro in that case) is in foreground performance is really a lot better. The proper workgroup is joined and all auxiliary thread deadlines are being met.
However when any different App is in focus performance tanks.
Profiling shows that when the Plug-In is running in the background E-Cores are preferred again for the Aux threads …
Has anyone seen that/knows a solution for?