okay, this commit changes the behavior completely. So I guess before this change all threads (with no defined priority) had actually very low priority on macOS?
Can we get some clear documentation here? We’ve noticed that some users complain about NEXUS suddenly loading a lot slower, needing a lot longer to scan folders, etc. all stuff we put into background threads running with priority 3 (so lower than normal, still higher than only-on-idle).
We’ve used priority 3 because without setting it explicitly, the “background”-tasks interrupted the message thread so frequently that the mouse stuttered. Once we switched to priority 3, everything became butter-smooth and loading times, folder scans, etc. didn’t really take any longer than before.
Once the M1 Max came out, we suddenly got a few complaints about slow preset-scan times (browser window stays empty until the scan is complete), and even preset-loading is affected, because we decode (uncompress) all the individual samples with a thread pool, again with priority at 3, and something that normally takes 50 - 100 ms now sometimes takes 4,000 - 5,000 ms!
This doesn’t happen on a standard M1 (we bought on release day in 2020), but apparently only on newer M1 Max.
This commit could help, but we don’t understand fully what it does. Can you maybe give us a clear description of what each priority means exactly now on macOS and Windows? Or is it compatible now? Can we set the thread priority for macOS and Windows and it will actually mean the same thing?
Would be nice if someone could confirm Thread Priority < 8 means it will never run on a performance core.
This suggests every worker thread created with a default priority (5) would have a distinct performance reduction on Apple Silicon.
Whilst there’s no explicit guide of what the default priority meant, I had not counted on these threads having such a low ceiling for performance when the machine is idle.
Hey guys - would be great to get some input on here. We have an App with zillions of Thread instances and need to know if we need to patch it so it runs on performance cores again, or whether you think this is a JUCE issue…
I know the JUCE team have probably been focusing on the J7 launch but this also has me worried. We have a lot of threads running and this could cause problems with file reading amongst others in Tracktion Engine/Waveform. Thanks in advance.
I spent most of today looking at this. Unfortunately, Apple has changed pthread priority characteristics across their devices, breaking our ability to map our 0-10 range to something useful.
I spent the day mapping the performance characteristics with raw pthread on my M1. 0-4 will restrict the thread to the E cores, and anything above 5 is balanced across all cores.
It appears to be slightly different on the Pro/Max, a priority in the 0-9 range will restrict it to the E cores and potentially run that at a lower clock resulting in extreme performance drops. I see x4 less performance on my M1 with 4 E cores; we’re likely to see x8 less on the Pro/Max with only 2 E cores.
sched_get_priority_min/max appears not to return anything useful on M1 platforms, and setting a priority of zero with the SCHED_OTHER policy (as pthread docs recommend/require) will force that thread into its lowest performance characteristic.
TL; DR. We can no longer rely on posix threads for macOS. Will Fix.
QoS is the direction Apple and Windows are going for scheduler prioritising, especially as we see more machines with asymmetrical processor architectures.
We’re going to have to update our Thread models in JUCE.
@Rincewind I did notice that qos.h (where I believe pthread_set_qos_class_self_np lives) does not have a method for setting the priority of another thread aside from temporarily. So there’s a bit more work for things like ThreadPools.
I thought I would clarify the current state of threading priorities on M1 platforms while we decide what direction we want to go in.
Since @reuk’s recent change, setting the priority on M1 platforms will have no effect. However, It does fix the issue of threads being placed into the lower tier. So you can expect all your threads to run as fast as possible (outside of real-time) regardless of priority, with no more crippling performance on M1+ chips.
The pthread_setschedparam method we currently use effectively exposes two priority levels:
0-4: Threads will be restricted to E Cores only.
5+: Threads will fill available P Cores and spill over to E cores if necessary.
As sched_get_priority_min/max always returns a range well above 4, threads on M1+ are set to the higher performance threshold regardless of the level requested.
These priority levels have nothing to do with the QoS classes to which a particular blog post refers. These are not mapped to pthread priority levels and can only be accessed via different API calls.
We’re currently thinking about what to do with this new API and how to best make it accessible for everyone.