Multi-threaded voice processing using 3rd party lock-free queue

Hey,
I’m making the voice processing of my synth multi-threaded. I’ve been advised before that this has limited benefits in a DAW, but since it’s for iOS and Android this shouldn’t be an issue.
I’m using this multi producer, multi consumer queue.
The threads (one for each core) are setup before hand like below. voicesRemaining is a std::atomic int.

numThreads = SystemStats::getNumCpus();
    for (int i = 0; i < numThreads; i++) {
        processThreads.push_back(thread([&]() {
            VoiceProcessTask task;

            while (true) {
                processQueue.wait_dequeue(task);
                task.process();

                voicesRemaining.fetch_add(-1, std::memory_order_release);
            }
        }));
    }

Then every call to renderVoices does something like this:

voicesRemaining = voices.size();
    for (int i = 0; i < voices.size(); ++i) { //add all voices to the queue
        processQueue.enqueue(
            VoiceProcessTask(voices.getUnchecked(i), &buffer, startSample, numSamples));
    }

    VoiceProcessTask task;
    while (voicesRemaining.load(std::memory_order_acquire) != 0) { //while the threads do their biz, help out with the queue
        if (!processQueue.try_dequeue(task)) {
            continue;
        }

        task.process();

        voicesRemaining.fetch_add(-1, std::memory_order_release);
    }

I’m seeing big benefits already in terms of performance, but I’m wondering if anyone’s got any tips to improve it further? One issue is how to destroy the queue while the threads are in the wait_dequeue state. I never need to destroy them during the lifetime of the app though so many it’s not so important. I guess they get brutally terminated when the app closes.

Thanks

2 Likes

Could you use a JUCE WaitableEvent instead? Then you not only have a timeout but you could easily wake it up to exit, flush the queue, and you’re done.

Perhaps the wait_dequeue() method can be woken up too?

BTW (although OT for the JUCE forum!) the docs for that queue suggest using ProducerToken and ConsumerToken objects for improved performance.

I could use the wait-free version of the dequeue and implement the waiting with WaitableEvents instead, that might be easier, not sure about speed though. I’ve read that waking up threads is slower than just leaving them spinning like the wait method.

You could normally leave the WaitableEvent waiting but only wake it up when you’re shutting down?

Have all your consumer threads check an exit flag right after wait_deque(). If set, they should exit. When your main control wants to end, simply set all flags and enqueue N (where N = number of threads) operations. They should all exit correctly without delay so long as the rest of your code works correctly. After your enqueue() operation, you can join() each one of them.

Also: Since you’re using C++11, you can use std::thread::hardware_concurrency() that takes stuff like hyperthreading into account the last time I checked.

Ok thanks, that seems reasonable. Any reason to use std::thread::hardware_concurrency over the Juce SystemStats::getNumCpus()?