FR: Thread-Priority vs Efficiency/Performance Cores

I found out if a juce::Thread should run on an Apple M1 performance-Core, it needs at least priority 8.

Also the current Intel “Alder Lakes” will introduce the efficiency/performance core design, but I’m not sure how this is implemented.

I would be great if the juce.Thread api will somehow reflects this, something like thread.setPriority(thread.getMinPerformanceCorePriorty())

okay, this commit changes the behavior completely. So I guess before this change all threads (with no defined priority) had actually very low priority on macOS?

Yes, I think that was the case.

1 Like

Can we get some clear documentation here? We’ve noticed that some users complain about NEXUS suddenly loading a lot slower, needing a lot longer to scan folders, etc. all stuff we put into background threads running with priority 3 (so lower than normal, still higher than only-on-idle).

We’ve used priority 3 because without setting it explicitly, the “background”-tasks interrupted the message thread so frequently that the mouse stuttered. Once we switched to priority 3, everything became butter-smooth and loading times, folder scans, etc. didn’t really take any longer than before.

Once the M1 Max came out, we suddenly got a few complaints about slow preset-scan times (browser window stays empty until the scan is complete), and even preset-loading is affected, because we decode (uncompress) all the individual samples with a thread pool, again with priority at 3, and something that normally takes 50 - 100 ms now sometimes takes 4,000 - 5,000 ms!

This doesn’t happen on a standard M1 (we bought on release day in 2020), but apparently only on newer M1 Max.

This commit could help, but we don’t understand fully what it does. Can you maybe give us a clear description of what each priority means exactly now on macOS and Windows? Or is it compatible now? Can we set the thread priority for macOS and Windows and it will actually mean the same thing?

Thanks :slight_smile:

2 Likes

Guys - we have exactly the same question - we upgraded JUCE and suddenly our app ran like shite, taking 60x longer to do some sound file analysis!

Really unclear what thread priorities we should be using any more.

2 Likes

So let me get this right:

Would be nice if someone could confirm Thread Priority < 8 means it will never run on a performance core.

This suggests every worker thread created with a default priority (5) would have a distinct performance reduction on Apple Silicon.

Whilst there’s no explicit guide of what the default priority meant, I had not counted on these threads having such a low ceiling for performance when the machine is idle.

Thanks

1 Like

Hey guys - would be great to get some input on here. We have an App with zillions of Thread instances and need to know if we need to patch it so it runs on performance cores again, or whether you think this is a JUCE issue…

2 Likes

@reuk

@jimc and myself have looked at this and we’re confident this is a JUCE bug that is capable of causing performance issues on apple silicon machines.

The code means that all threads with a juce priority<8 have their apple priority set to 0:

policy = priority < lowestRealtimePriority ? SCHED_OTHER : SCHED_RR;

param.sched_priority = [&]
{
	if (policy == SCHED_OTHER)
		return 0;
		
		return jmap (priority, lowestRealtimePriority, maxInputPriority, minPriority, maxPriority);
}();

Due to some macos QOS stuff, this then means those threads only ever run on efficiency cores (of which there are only 2 on M1 Pro and Max)

Here’s a simple fix that seems to keep the original intent of the thread code:

Can we please have it patched back into JUCE 6.1? Incidentally, the code is still in JUCE 7 and presumably still causing problems.

FWIW, I suspect that this OpenGL fix here is also no longer needed after the change above. Thread: Update macOS thread priority calculation · juce-framework/JUCE@48c6087 · GitHub

Thanks!
Dave

7 Likes

I know the JUCE team have probably been focusing on the J7 launch but this also has me worried. We have a lot of threads running and this could cause problems with file reading amongst others in Tracktion Engine/Waveform. Thanks in advance.

I spent most of today looking at this. Unfortunately, Apple has changed pthread priority characteristics across their devices, breaking our ability to map our 0-10 range to something useful.

I spent the day mapping the performance characteristics with raw pthread on my M1. 0-4 will restrict the thread to the E cores, and anything above 5 is balanced across all cores.

It appears to be slightly different on the Pro/Max, a priority in the 0-9 range will restrict it to the E cores and potentially run that at a lower clock resulting in extreme performance drops. I see x4 less performance on my M1 with 4 E cores; we’re likely to see x8 less on the Pro/Max with only 2 E cores.

sched_get_priority_min/max appears not to return anything useful on M1 platforms, and setting a priority of zero with the SCHED_OTHER policy (as pthread docs recommend/require) will force that thread into its lowest performance characteristic.

TL; DR. We can no longer rely on posix threads for macOS. Will Fix.

7 Likes

The docs from Apple state, that one should use pthread_set_qos_class_self_np
to set priority. Maybe this is something that can be adopted in JUCE.

QoS is the direction Apple and Windows are going for scheduler prioritising, especially as we see more machines with asymmetrical processor architectures.

We’re going to have to update our Thread models in JUCE.

2 Likes

Thankyou very much for looking into this! :slight_smile:

In case it helps others following this, I found the following doc enlightening: Scheduling of Threads on M1 Series Chips: second draft – The Eclectic Light Company

@Rincewind I did notice that qos.h (where I believe pthread_set_qos_class_self_np lives) does not have a method for setting the priority of another thread aside from temporarily. So there’s a bit more work for things like ThreadPools.

Thanks again,
Dave

I thought I would clarify the current state of threading priorities on M1 platforms while we decide what direction we want to go in.

Since @reuk’s recent change, setting the priority on M1 platforms will have no effect. However, It does fix the issue of threads being placed into the lower tier. So you can expect all your threads to run as fast as possible (outside of real-time) regardless of priority, with no more crippling performance on M1+ chips.

The pthread_setschedparam method we currently use effectively exposes two priority levels:

  • 0-4: Threads will be restricted to E Cores only.
  • 5+: Threads will fill available P Cores and spill over to E cores if necessary.

As sched_get_priority_min/max always returns a range well above 4, threads on M1+ are set to the higher performance threshold regardless of the level requested.

These priority levels have nothing to do with the QoS classes to which a particular blog post refers. These are not mapped to pthread priority levels and can only be accessed via different API calls.

We’re currently thinking about what to do with this new API and how to best make it accessible for everyone.

4 Likes

Thanks for the updates @oli1 - I’m starting to dig into various forum posts on Apple silicon threading issues (because I think this may affect my plugin/s), and reading this latest message I would like to check a couple of things you mentioned.

“Since @reuk’s recent change, setting the priority on M1 platforms will have no effect. However, It does fix the issue of threads being placed into the lower tier. So you can expect all your threads to run as fast as possible (outside of real-time) regardless of priority, with no more crippling performance on M1+ chips.”
Does this mean that JUCE Thread class setThreadPriority has no effect on Mac Silicon currently (as of JUCE 7.1)? How about if I need to set lower priority threads for a task I want to run at a priority lower than the message thread? I also need to be sure I can set threads to Audio Priority.

For audio processing I’m creating a pool of (JUCE Thread) threads to process multiple voices in parallel - this holds up the audio callback until all thread tasks are completed - tasks are assigned each time there is a callback (based on the available threads), but the threads are requested outside of the audio callback. (This has worked really well on Intel silicon, and initially on M1 seemed to be working very well too - but I’m concerned recent updates to both JUCE and Mac OS may break things for this method of parallelizing my voice processing.)

I would assume it’s pretty important then that each voice is processed with the same priority and so need to be on the same performance cores (otherwise one thread may hold back the audio callback significantly). I’m concerned about that statement “Threads will fill available P Cores and spill over to E cores if necessary.” - is this applicable to JUCE Thread class?

I always assumed that starting a JUCE Thread with a specific priority would just work (there’s no return value in the ‘startThread’ function that would indicate a different priority actually got assigned) - but is it the case then that a requested priority can be rejected/changed (even with Intel silicon)?

Given the statement that setting priority on M1 platforms will have no effect, should I still be using JUCE Thread class, or is it required to directly write Apple silicon specific thread handling/request code?

“We’re currently thinking about what to do with this new API and how to best make it accessible for everyone.” - as part of the JUCE API?

Thanks in advance for your advice.

The current fix in place for M1 platforms is a short term fix.

The legacy ‘pthread’ API only gives you two options, they’re undocumented and a bit vague.

In my testing, 4 and below restricted the threads to E cores and 5 and up ran them on P cores, although full CPU utilisation can be seen if you create enough threads.

We have a new Thread API that’s in its final stages of development and hope to have it public very soon. This new API will fix these issues with M1 platforms.

8 Likes

Looking forward to the updates, as have been getting reports of poor performance on systems with only 2 e-cores (meanwhile I had to add options to disable or limit the number of audio threads in my plugin).

4 questions:

Will there be a way to confirm that a requested thread is assigned and running on p-cores or e-cores?
(This would be extremely useful to allow the app to drop the usage of e-core threads if necessary to ensure consistent audio processing thread process time.)

Will it be possible to confirm the priority a thread is given (e.g. to be sure a background file scan cannot hold up the UI drawing)?

If it’s possible that a running p-core thread thread gets demoted during operation to an e-core, will there be a notification from the API?

Are the updates to the Threading API going to be taking into consideration ‘Audio Workgroups’? - seems essential for ‘audio priority’ threads.
(Apple Developer Documentation)

No, there is no method of querying core assignments reliably. So you generally have to trust the OS to make good on its promise of priority.

You probably don’t want to pin threads to cores; the OS will ‘usually’ make better scheduling decisions. You might be able to do this with the Mach kernel affinity API, but it doesn’t pin threads to A core; it groups threads to run on the same die/cache pair. However, It might not always be the same die. (It doesn’t appear to have been updated for quite a while, I wouldn’t rely on it for the M1 platform).

As far as I can determine, there is no documentation on scheduling logic behind core allocation for asymmetrical CPU architectures (for Windows or Apple). Still, you should be able to expect high-priority workloads to be favoured for high-performance cores.

Unfortunately, we don’t get any notifications on core changes.

The Audio-Workgroup API doesn’t appear to make sense within our thread model, so it’s not included in this release, but it is something I’m going to investigate further.

Thanks for the detailed reply @oli1
The main thing I’m looking to ensure is that I can create several high priority threads in a pool that can be used to process several audio tasks within the audio callback to make the most of multiple cores, this is why it seems critical to ensure they are all assigned equal priority cores, and would appear to be the exact reason for the Audio-Workgroup API - to ensure no single thread in a related processing group significantly holds up completion of the work within the audio callback. The OS cannot really make an informed decision on whether it’s safe to drop a thread to an e-core unless it knows how that might affect other thread tasks.

Apple doc says: “For every audio server I/O thread, its associated app also has its own real-time render thread. The audio server wakes and waits for the app to produce its output on each I/O cycle. The system automatically joins the client app’s real-time thread to the audio device’s workgroup. By doing so, the system tells the kernel that both threads are working together with a common deadline and can better optimize performance.”
“If your app creates one or more auxiliary real-time threads that run in sync with the audio server I/O thread, join them to its audio workgroup. Doing so informs the system that these threads are rendering to the same deadline, which helps it optimize performance.”

This to me seems super critical for JUCE to enable - after all, the most important aspect of any audio app is audio performance and avoidance of dropped frames - and without Audio workgroups on Mac OS it seems it will be impossible to utilize multi-core processing reliably for real-time audio processing.