Load samples on background thread

Hi there,
I’m a bit confused on how to load samples in a background thread. Just to be on the same page: The first few samples (in my case 4096) are preloaded when the app starts.

But what technique is now the best to load the rest? I tried my luck with a thread pool and it actually works pretty well. Even though the calls to addJob are possibly blocking I couldn’t notice a problematic performance impact.
But now I caught myself thinking a lock free queue would be a better solution. But how exactly would you implement the thread working the queue? A while loop and sleep 1ms if the queue is empty? Cached are only about 5ms so 1ms on Thread::wait could be a bit tuff (especially since 1ms can’t be guaranteed). Would Thread::yield be a better alternative, when the queue is empty?

How should the background thread load the samples? I read a lot about the memory mapped reader but I’m not quite sure how to use it correctly. Should the audio thread use the reader (using MMReader#read…) and the background thread just call MMReader#touchSample? If yes, in what range should the background thread touch the samples (1024, 2048 or am I completely off)?
As I wasn’t sure about the memory mapping, I ended up using only the normal reader. The ThreadPoolJobs read the samples into the audio buffer. (A little side note: I noticed a “weird” behaviour. Even though I perform a thread safe check if the background thread loaded the sample positions I’m accessing on the audio thread, the background thread kind of gets in the audio threads way when writing to the audio buffer while the audio thread is reading from it. The background thread is definitely NOT writing the same sample positions read by the audio thread. But some how the background thread manages to insert “NaN” into the audio buffer. Am I missing something? Shouldn’t I be able to read and write from/to a c-array aka. AudioBuffer concurrently when making sure I don’t do it at the same position?). Besides that, there are no problematic performance issues on this side either.

I’m not looking to the one correct answer that solves all my problems, just for some advice or experience concerning background threading the sample loading. :slight_smile:

Cheers!

I’m just spit balling here, but as far as the queue goes, my first thought would be to try making an encapsulated thread object that’s safe to wake/start from the realtime thread, possibly like this:

struct RealtimeStartableThread : public juce::Thread, private juce::AsyncUpdater
{
    RealtimeStartableThread()
    {
        triggerAsyncUpdate();
    }

private:
    void handleAsyncUpdate() final
    {
        startThread();
    }
};

the call to triggerAsyncUpdate might be blocking, but you said that you haven’t noticed a performance impact with your current usage of ThreadPool::addJob, so… ¯_(ツ)_/¯

The idea here would be that you have some sort of struct that contains a set of these objects, and then when you need to load a new sample, you just create a new RealtimeStartableThread object, directly in the call from the realtime thread, and theoretically the new thread should be started asynchronously by the message thread.

Hope this helps.

The thread pool has the big advantage of splitting the cpu power for all the samples that what to be loaded. A 15 sec samples could ruin the days for all the other samples being started simultaneously as the 15 sec sample takes “for ever” to load. The thread pool jobs only load a small amount of the sample in the runMethod and the return “needsToRunAgain” so the 15 sec sample is split up into small pieces.

And concerning the blocking of triggerAsyncUpdate. The background thread(s) of the thread pool have nothing to do but to load samples. They are not at all occupied, so (I think) the scoped lock almost never locks (no concurrency what so ever). The message thread is blocked all the time as all UI requests are piped through the message queue.

I implemented this with a ‘std::condition_variable’, what it does (in conjunction with a std::mutex) is suspends a thread until it is signaled from another thread to resume.

Note: On platforms like windows you can’t sleep for only 1ms (the minimum is like 5 or 10ms), so that’s not a good option.

Calling triggerAsyncUpdate() from your audio thread isn’t ‘real-time safe’. It allocates and could therefore block, causing your audio to glitch. However, you might often get away with it, especially if it happens only once at startup.

But, a better design would probably be to start the loader thread from the main thread directly, e.g. during your plugin’s constructor. Then as samples finish loading push them from there onto a FIFO queue, to get picked up by the audio thread.

You asked about writing into an AudioBuffer while it’s being read by the audio thread - doesn’t seem like a good idea at all to me. AudioBuffer isn’t designed for thread-safe use like that. If you absolutely need that kind of functionality, use a proper lock free FIFO queue.

But if you can avoid that I would - it’s a famously hard topic. Start by loading the whole sample in one go and sending it to the audio thread when done. Especially if you use a thread pool, loading samples from disk should be very fast. Make sure to load nice big chunks from disk, e.g. 8092 bytes at a time.

Using mutex (and therefor condition variable too) is not realtime safe. Thats why I’m trying to get rid of the thread pool. But the thread working the lock free queue would have to busy loop. And my question would be: how to implement this thread(s) working the queue without wait and notify?

The threads don’t write/read the same sample, just the same “c float array” (the AudioBuffer is really just a wrapper). So there shouldn’t be any problems on that front. I actually found the bug I mentioned in my original post and I had nothing to do with the concept of writing and reading to the same array at different positions itself.

I just made sure of that. Quote from “C++ Concurrency in Action, Second Edition” Chapter 5 about atomics:

Variables of fundamental types such as int or char occupy exactly one memory location, whatever their size, even if they’re adjacent or part of an array.
So if you make sure you don’t read AND write the same sample inside an AudioBuffer, you’re good to got.

Has someone else maybe experience he/she wants to share? Both problems are still present in my project

Feel free to crawl around in here

There‘s a lockfree threadpool in there. However in my streaming engine I‘m using two buffers per voice and swap them so one is being read in the audio thread and one is being written to.

1 Like

I’d also suggest seeing those 2 lectures from 2017 ADC.

It’s a very neat 101 for some techniques and ‘gotchas’.

1 Like

Thx for pointing that resource out to me! I‘ll definitely have a closer look into this project. And even though this thread“pool“ (its really only one thread) does a way better job in avoiding locks than the JUCE ThreadPool does, it still makes a system call with Thread#notify that might block. So its not really lock free (what leads me to the second post)

Yes, I watched this talk and it was quite interesting. Sadly they didn‘t cover the one thing that I didn‘t „know“ or never really read about: how to code the worker threads. Every post really seems to target the way to hand off work to the other threads, but that problem is really already solved with all the libraries and concurrency books. I‘m interested in techniques to write the four additional audio threads, that help the real time thread to calculate the graph. Or the four thread pool threads, that don‘t depend on a call Thread#notify to meet the requirements.

One post by @dave96 mentions that briefly (I‘ll go check out his linked resource now)

EDIT: sadly, the github link provided is dead

Here’s the updated link: https://github.com/Tracktion/tracktion_engine/blob/master/modules/tracktion_graph/tracktion_graph/tracktion_graph_NodePlayerThreadPools.cpp

1 Like

Thank you! The thread pools are finally a working (and good looking) example. Now I’m interested how the thresholds for the pause cycles were selected. Was that just guessing, or is there some underlying research?