ThreadPool thread affinities

It looks like ThreadPool just spawns a bunch of threads to be used for running jobs. Couldn’t this be made more (possibly much more) performant by setting the affinity for each thread such that the threads are locked to cores?

This helps with situations where the OS/some other app wants to run something on one core, so the OS boots one of the ThreadPool threads off to another core, which then runs (or something else runs) and boots another ThreadPool thread off to another core, and so on, causing a lot of context switches that could be avoided if the threads were just stuck to cores via affinity masks.

i.e. changing

void ThreadPool::createThreads (int numThreads, size_t threadStackSize)
{
    for (int i = jmax (1, numThreads); --i >= 0;)
        threads.add (new ThreadPoolThread (*this, threadStackSize));

    for (auto* t : threads)
        t->startThread();
}

to (untested)

void ThreadPool::createThreads (int numThreads, size_t threadStackSize)
{
    int numCpus = SystemStats::getNumCpus();
    for (int i = jmax (1, numThreads); --i >= 0;)
    {
        ThreadPoolThread* newThread = new ThreadPoolThread (*this, threadStackSize);
        uint32 affinityMask = 1 << (i % numSystemThreads);
        newThread->setAffinityMask (affinityMask);
        threads.add (newThread);
    }

    for (auto* t : threads)
        t->startThread();
}
2 Likes

The theory is that the OS should balance this kind of thing out for you in most cases, and usually works better than if you try to impose rules on it!

For example, if all your machine only has e.g. 4 threads that are busy at a particular moment, then the OS should easily be able to figure out that they should each have a core. However, if other apps or threads are also busy, then it might be best for the OS to put all your threads onto fewer cores so that the overall system performance is better, but if you’ve messed with the affinity then it doesn’t have that option.

If you’re unconcerned about it then I won’t worry about it either.

Perhaps one day I’ll write a benchmark out of curiosity. I would lean toward not trusting OS thread scheduling when trying to squeeze performance out of a thread pool, where a context switch could incur big penalties if the pool is saturated with jobs.