Mac M1 thread priority & Audio Workgroups

When I wrote this originally I was building against JUCE 7.0.0 with XCode 13.4.1. But, I just tested JUCE 7.0.1 and there’s no improvement on the issue.

When you say M1 performance is a known issue, do you mean in general? Or as it relates to JUCE code?

Just to be clear, the issue goes away if you use the Audio Workgroup API as described above, so it seems JUCE specific to me. Also, it is not a general audio performance issue as it runs fine when you use the same device for input and output.

The pthread API used on macOS doesn’t work very well with the M1 platforms; a pthread priority below four will lock the thread to the E cores, which, ironically, on the higher performance variants causes a significant performance disparity due to having 2 vs 4 E cores.

I inquired about the JUCE version as a fix was applied recently that prevented low-priority threads. I tested our new code with AudioDeviceCombiner, and it was as stable on my M1, but I have the M1 with 4 E Cores.

We’re currently transitioning our native threading code to use modern threading priority classifications and APIs (QoS on Apple platforms).

The Audio Workgroup API is something we’re still looking into and where/if it fits into our Thread API.

2 Likes

Sorry for dumb question, but does JUCE Thread use pthread API? That is my assumption, but I never looked under the hood to see what is going on (my bad).

I also just assumed JUCE Thread class does what it says, but it seems no longer the case for all platforms?

Any news on this new Thread API ?

Thanks !

1 Like

It’s going through the final staging of review and, hopefully, be available sometime next week!

8 Likes

I see there are updates to the juce::Thread class in the develop branch for the thread priority API, but still see nothing there related to joining audio work groups. I believe that is the original point of the creator of this discussion thread - to improve audio performance by making threads join audio workgroups (on Mac systems).

In this related forum thread - rmuller provided some example code for integrating the audio render context observer callback into juce_AU_Wrappper.mm:

I still don’t fully understand how to integrate that callback with my juce::Thread threads, but it should in theory allow to join those threads to the same workgroup as the main audio callback for the au plugin - and a similar process should be possible for standalone builds.
I am following up with rmuller for some tips, but preferably would not need to hack in those changes into the JUCE API code myself.

Ideally there will be an option in the juce::Thread class to request that newly created threads are automatically joined to the same audio workgroup as the main audio callback in AudioProcessor - maybe a generic cross platform option, that then has the platform specific code handled in the underlying thread/plugin wrappers code.

Are there any updates on if/when that may happen? (oli1 - you mentioned you’d revisit it after ‘unicode support’ work).

(I would expect there could be something very similar required for audio threads in Windows at some point as new Intel processors have similar e/p core splitting.)

Thanks!

2 Likes

If anyone wants to try to try to patch this into JUCE themselves, i posted my hack here: MacOS Audio Thread Workgroups - #4 by tlacael

I’m a long-time user of Unify, a JUCE-based combination host/plug-in from PlugInGuru. It runs poorly on my Macbook M1 Max, with a lot of CPU spiking. I have already complained to the developers, who have told me this only happens on the Pro/Max versions of the M1 chip, and requested that I report this directly to the JUCE development team. I see that this phenomenon, which affects other multi-threaded JUCE apps/plug-ins, has been discussed extensively on the JUCE Forum for a year, but there is still no official solution. I urge you to address this as soon as possible.

Hi @myuusic

I have one customer who has reported similar CPU spiking issues on a Mac Studio M1 Max system with a multi-threaded plugin of mine, also in Rosetta mode. In Logic with just one virtual instrument and this plugin. Less powerful Macs like an older M1 Mini can run 10-20 instances of this plugin. He should be able to run many more simultaneously.

It was built with the Juce 7 dev branch from around March or April if I remember well.

Does the CPU spiking also occur when running the Unify plugin in Rosetta mode on your system? On his system it did.

Hi Peter,

thank you for your fast response! I use this plugin inside Cubase 12 natively. I just switched to Rosetta to test it and yes, the CPU spiking happens in Rosetta mode as well.

I want to ENCOURAGE the dev community and JUCE team to resolve this ASAP

Hi folks, First of all, I’m a fan of much of what is being accomplished, with JUCE, I own and use many of these plugins.

Central to what I do is PluginGuru’s Unify since about 18 months ago. As Apple has dead-ended all of my Intel machines, excepting one maxed out i9, which is still under AppleCare which is probably why they can’t obsolete it yet, I’m now embarking on the transition to Apple Silicon for all of my music-making gear (live and studio rigs).

Unify is severely under-performing on AS, to be specific a Mac Studio. Audio is cracking and popping under as good as zero load. This would appear to be a problem of Unify also utilizing energy cores versus performance cores, with no mean by which, we, as users, can instruct Unify to stay away from the e-cores.

Realtime audio, especially performance-oriented software needs to be up and in the p-cores only. Even Apple understands this as MainStage won’t even offer e-core selection, and Logic allows users to select e-cores if needed, but there are p-core only options. Again, Unify cannot do this as what I’m hearing from the developer (who is always responsive!) is that he’s dependent on a long-promised solution via the platform, but none available yet…

I have taken a few days to read through the relevant threads here, and… I’m not just a musician, I have worked for “those guys in Redmond” and have decades of work in product launches, product development, and implementations. I’m retired from that world, but “I get it”. Folks are searching for a solution, and kludging solutions that maybe work for AU, but not VST… so forth and so on…

Hence the encouragement to get this issue solved. Cleanest is obviously to have p-only core affiliation handled by JUCE. It appears that folks aren’t clearly in understanding how Apple wants this accomplished, is it a priority, a flag, or some high-road QoS based audio workers workgroup solution??? Dunno… That last one is a lot of work for something Apple shows to users as p-core preference and affiliation in their own products. I don’t know - I just know they’re doing something, and it appears that high-end SoCs that have a bare minimum e-cores are causing issues in one or more products that leverage the JUCE platform.

As an end-user - just need Unifi’s developer supported, either direct, peer-developer support, or… a move “up” the priority list for getting this somehow into JUCE.

To be clear… not all of my JUCE-based products are manifesting this unusable show-stopper behavior. I don’t know why. It raises a question for me as well, okay, how are developers then “rolling their own” solutions where this is not handled in JUCE??? How resilient are those solutions? Again, dunno. Don’t need to know… am an end-user.

Just need, and hence this request, in the form of encouragement, to see how to implement anything at all that resembles what Apple seems to be doing in their own products, which is presented to users as p-core affinity and not what might behind the scenes actually be a workgroup of processes that are in reality, either set by priority or some other mechanism to be p-core processes.

Dunno, and as an end-user, don’t need to know either - just need what BillG once described to me as the “promise of technology” to manifest itself here so that we can continue to do what folks not all that long ago, could only dream of.

Again, couldn’t be happier in general, but this issue is a real show-stopper.

In closing, am not complaining. I just can’t use this product if it cracks and pops within seconds. Just can’t. Could never recommend to use that layer for live touring acts. That’s just a reality.

What I can do, and that is the point of this post, is to raise awareness of a problem and encourage all to help enable each other to get over this new asymmetric model of e-core and p-core computing.

When “up there” in the p-cores, it seems to behave as expected.

How can I help more than just encouraging y’all to have a look into this (soon, if not yesterday)?

1 Like

Hi Peter, just curious to know if you used the Audio workgroups thread registration method for your real-time audio threads (for Mac ARM builds)? I did for my plugin and it definitely improved things (but still doesn’t appear to be a way to force threads to p-cores only).

(Awaiting JUCE team to formally bring this to JUCE API - hopefully soon, but there is a least a working solution for AU/Standalone if you check my and other related posts:[MacOS Audio Thread Workgroups - #11 by onar3d]. Maybe the latest VST3 updates now allow to retrieve the audio workgroup - I haven’t had time to check that.)

1 Like

No, I am still using the standard Juce thread.

My calculations have non-critical deadlines, that is, with high priority they are always ready in time, even with 15-25 instances of my plugins.

I am though still following the (non?) progress here on adopting Apple’s new APIs/guidelines/whatever, because I have no time to invest in this myself, as I am focusing on new products, customer functionality and user experience.

Cheers
Peter

There is an internal branch with AudioWorkgroup support, hopefully making it’a way onto develop, probably after we’ve dealt with any fires caused by the recent release on master. That being said you can now call startRealtimeThread on the Thread class.

3 Likes

Thanks for the update, looking forward to that.
Will there be any way to force threads to only run on P-cores (for Mac silicon)? Or is it still up to the OS based on the QOS settings/Realtime option in the startRealTimeThread function, or some other trick to force threads stay on P-cores and not get demoted to E-cores?

I’d love to see what the AudioWorkGroup branch looks like, just to get an idea of how the client code should be structured. It’s a shame it’s not available on a separate branch.

I use juce::ThreadPool. As an experiment I tried modifying the JUCE code to call startRealtimeThread() instead of startThread(), but this didn’t result in any improvement.

All of our customers who purchased M1 Pro/Max/Ultra based Macs have been greatly disappointed, and one (so far) has reported similar problems on a new PC using one of the latest Intel Alder Lake chips (which also have separate E/P cores). Without a way to ensure that our own threads stay on P cores, this is gradually going to kill our product entirely.

1 Like

What version of JUCE are you (or were you) using? there was a bug that until very until recently meant startRealtimeThread() was really doing anything!

Realtime threads on macOS are achieved by upgrading the existing thread so if you want you could pull in the headers and do it yourself but if you take a look at the latest version on develop I’ll be surprised if you don’t notice a difference. I ran multiple tests while working on the HighResolutionTimer and definitely found a very noticeable difference. You should tweak the RealtimeOptions to ensure you get the right settings, although IME I struggled to get it to make much of a difference. There are some settings that will guarantee it doesn’t work but I think we should have protected against that or at least put jasserts in for the cases we could find.

I was using JUCE 7.0.6, release branch. Thank you for the suggestions, but I prefer to avoid the whole guessing-game of debugging this myself, and wait for the official solution which has been in the works for over a year.

I think regardless of the startRealtimeThread() function where you can define the ‘max expected’ process time for the thread (and consequently affect QOS/thread policy settings), for audio without glitches on M silicon where multiple threads are working to fill the same final output buffer for the main core audio thread/callback - they need to be assigned to the same audio workgroup to ensure the OS knows they have to meet the same deadline.
(I remember reading somewhere that if you don’t meet the max expected process time when running your thread, then your thread might also get demoted.)

Quote from Apple dev pages: “Each Core Audio device provides a workgroup that other realtime threads can join… Joining the audio device workgroup tells the system that your app’s realtime threads are working toward the same deadline as the device’s thread.”

I’m still waiting for an answer about if we could ensure those threads can also get assigned only to P-cores (everything I read says otherwise - and you have to hope that a sudden load on an E-core doesn’t hold up your audio threads that are currently running there/hold up any switching over to P-cores when the OS scheduler decides it’s necessary… Why then does Logic have GUI options to specify usage of P-cores only?). JUCE team - any comment?

I’d also like to know if the audio workgroup can be retrieved and used for VST3 plugins now?
Any comments about that from JUCE team would be much appreciated - thanks!
(Hopefully all will be revealed soon anyway.)

I wouldn’t be surprised, if the CoreAudio Team implemented a “secret” function just for Logic. So this setting doesn’t count towards a possible way to enforce P-Cores only – I’m absolutely certain, that there is none.
But I read in a different thread here, that using audio work groups pretty much solved all their problems. The only thing I (personally) can just not grasp is, how audio work groups help an audio unit. Is an audio unit allowed to attach more threads to the DAWs work group? That doesn’t seem right to me from a conceptual stand point.