FR: Thread-Priority vs Efficiency/Performance Cores

For those that still didn’t figure this out, here is a gist with all my changes based on the help from @rmuller in the thread above, and other forum posts on the subject - I’ve tested this working with AU plugins on M1 based Mac Air:

There are 3 files in the gist, juce_AudioProcessor.h, juce_AU_Wrapper.mm, and juce_mac_ObjCHelpers.h (it’s uploaded as gistfile1.txt) - I suggest you just diff those files to see the changes, but also see my notes at the top of each file.
I don’t fully understand the changes in juce_mac_ObjCHelpers.h - just taken from rmuller’s example code.

I was stuck for a long time getting un-explained crashes until I realised I was missing the ‘thread_local’ keyword on the os_workgroup_join_token_s structure creation.
My implementation in AudioProcessorHolder class appears to work reliably - and I create and destroy audio thread pools regularly (every time a patch is loaded or my voice count settings change), but I may have missed something… for example I’m not sure if I should do anything with the ‘nullWorkGroupEvent()’ function.

See also the post https://forum.juce.com/t/os-workgroup-join-consistently-returns-einval-cant-join-audio-workgroup/54240

Whilst I added this for AU plugins, I still don’t have a solution for VST3 or standalone.
Anyway, I hope this kind of implementation can be added to JUCE officially because I really don’t want to have to hack in such OS specific code and maintain even more custom bits of the JUCE codebase.

Additional note - threads appear to join the audio workgroup correctly and I noticed better behaviour in terms of not getting pops/clicks, but it does not appear that all processing of threads is automatically moved to P-cores… simply that the OS makes sure to schedule those threads together in a way that means they don’t hold each other up significantly (also this requires that your thread tasks are doing something that is very similar in process time… in my case in most cases I’m processing identical audio chains - one for each voice).
Also, additional note, I saw that ‘startRealtimeThread’ internally did not use the highest priority by default - and when I use that with my standalone build, the process load goes up significantly if the window is no longer the active application window… changing the JUCE code to set the priority to ‘Highest’ made that issue go away - I’m not sure if that classifies as a JUCE bug?

Edit: small update on this - to prevent crash/issue on Intel Mac builds where the workgroups join function will fail (even though there is a render context observer event) - I had to tweak my code to make sure I only store thread/token assignments if the result of os_workgroup_join succeeds:

    int joinCurrentAuWorkGroup(void* threadId) //  call from thread at run, before while loop
    {
        // Join this thread to the workgroup.
        if (@available(macOS 11.0, *))
        {
            thread_local os_workgroup_join_token_s joinToken{};
            
            const int result = os_workgroup_join(currentWorkgroup, &joinToken);
            
            if (result == 0)
            {
                AudioProcessorHolder::WorkgroupAndToken toStore = {currentWorkgroup , &joinToken};
                threadTokenList.emplace_back(threadId, toStore);
                return 1;// Success.
            }
            else if (result == EALREADY)
            {
                // The thread is already part of a workgroup that can't be
                // nested in the the specified workgroup.
                return -1;
            }
            else if (result == EINVAL)
            {
                // The workgroup has been canceled.
                return -1;
            }
        }
        return -1;
    }

For the moment, I am assuming that the Workgroups API will never used for Intel based Macs unless one day Apple decides to go back to Intel silicon chips with the new big/little architecture… so it’s probably safe to just #ifdef out all the workgroups related code with #ifdef JUCE_ARM.

4 Likes