I have a few parameters, that when changed, take quite a while to calculate. All 88 keys need to be recalculated and if the user is twisting the control for one of these parameters, it gets quite laggy.
I’m considering only doing the calculations when the user releases the control. Or maybe some sort of background thread. Maybe break it up into 88 tasks instead of one longer task. Not really sure what’s best.
Depending on how time consuming the calculations are, we found two approaches which both work fine in our products.
One is the “Parameter Change Throttle”. When a parameter change arrives, a flag is set to ignore the next changes and a timer is started, which will reset the flag and actually apply the changes for the last known parameter value and stops itself afterwards. This way, the timer interval ensures a minimum time between two parameter changes and responsiveness becomes significantly better. The right interval is subject to some tweaking and may be application dependent.
The other one is the background thread approach. All parameter changes are enqueued in some suitable kind of queue – it might be important to think of multithreading issues here as parameter changes might come from different threads. Then one global thread instance with medium thread priority polls all the queues on a regular basis. If they contain changes, it will pop elements until the most up to date values have been identified, intermediate values will be dropped. Now all computations are done on the background thread. We do this for really heavy stuff like synthesis of long impulse responses based on user parameters and take care that the audio result is continuously crossfaded between the previous and current parameter set. Again, finding the right interval and crossfade time here is subject to tweaking.
With both approaches, you should keep in mind that they will not work as expected in a non-realtime rendering context with automation. Therefore, both implementations need some kind of offline rendering mode, which will simply do everything synchronous, leading to perfect automation results. This might lead to different audio results, if a lot more intermediate states can be heard now, so you have to find a good way to handle that.
Afaik, the standard libraries parallel algorithms spin up their own threads to do the parallel work and of course implements some locking to ensure that we wait until all threads have finished their work. The overhead of creating a thread is quite big, so the problem to solve must be a lot more computationally heavy to really get to the point where this creates a benefit. And then we have the problem, that parameter changes might come from the audio thread, where you don’t want to do anything like creating threads and waiting for locks – priority inversion problems are not unlikely here.
Despite their elegance code-wise, I think there are few cases in audio plugins where I would consider these algorithms the right choice.
Afaik, the standard libraries parallel algorithms spin up their own threads to do the parallel work
MSVC uses thread pools for parallel algorithms [1], GCC uses TBB as backend which apparently also uses a thread-pool implementation [2], while LLVM does not (yet) have support, but will probably also use TBB as backend [3].
Unfortunately not. I’ve narrowed it down to the smallest chunk of code that needs to get done, it’s still expensive.
It sounds like I need to implement a few things:
Start a thread at startup with nothing to do until needed, this way I don’t have the overhead of creating the background thread every time one of these parameters changes.
Set a flag for my voices to not output until the worker is finished/table is ready.
Don’t run the thread for every parameter change, setup some kind of throttling like @PluginPenguin mentioned so I only recalculate as few times as needed.
Yes, I was planning to use a JUCE thread class of some type, but I see there’s a few to choose from. Having never actually used threads before, any thoughts on the best one? I see ThreadPool, Thread, TimeSlicedThread, etc… Would it be crazy to have TimeSlicedThread create 88 of them?
The juce::Thread is a generic one you override. Not the right fit, as it is only spawned once you create them.
juce::ThreadPool: some idle threads you can throw a ThreadPoolJob at, that runs from start to finish
juce::TimeSliceThread: some idle threads that do continuous tasks, like checking if a buffer is filled in BufferingAudioSource etc. Basically tasks that run forever
I was thinking about using a simple Thread…creating it during start up, and calling startThread() when I need it to run, but I’m reading in some older forum threads that startThread() takes some time. So, maybe I want to start it upon creation and just have a flag that run() can see and tell it to dump out unless I need it.
By “some time”, they mean a few micro-seconds. Nothing you would notice. I would create a ThreadPool and then add your 88-jobs to it when the dial gets changed. I would also delay adding the pool-jobs, so they only get created if the dial hasn’t been changed in the last 100-200ms.
Here’s a step-by-step (simplified):
Create a juce::ThreadPool object as a class member
Have your class that listens to the dial inherit from a juce::Timer
If the dial changes, start the timer with a 100-200ms timeout.
If the dial changes, re-start the timer, so it doesn’t process twice (step 3 should do that automatically)
If the timeout is reached, cancel all currently running jobs in the ThreadPool
Create your 88 jobs in a loop and add them to the ThreadPool
Yes, there will be a slight delay, but you won’t have to process everything all the time just because somebody is adjusting a value.
Thank you, everyone. Based on all your comments, this is what I came up with. Maybe it will help someone else.
The Timer solution was an elegant way to throttle the stream of parameter changes. For now, I ended using a single Thread() instead of 88 in aThreadPool, which at this point, seemed overly complicated for this purpose. The single Thread() got the job done in a reasonable time. I might go back and try using a ThreadPool at some point, but this works well for now.
class MyInstrument : public Thread, Timer, AudioProcessorValueTreeState::Listener
{
public:
MyInstrument() : Thread("Big Update") {}
void parameterChanged(const String& parmID, float value) override
{
if(parmID == "ExpensiveParameter")
startTimer(200); // or roughly how long a bigUpdate() takes
}
void timerCallback() override
{
DBG("-------- time to run bigUpdate -------------");
stopTimer();
stopThread(20);
startThread(5);
}
void run() override
{
{
ScopedLock lock(processor.getCallbackLock());
updateReady = false;
}
double timeSec;
DBG("******** RUNNING bigUpdate *****************");
{
ScopedTimeMeasurement m(timeSec);
bigUpdate();
}
DBG("******** FINISHED bigUpdate() in " << String(timeSec, 4) << " seconds *****************");
updateReady = true;
}
void bigUpdate()
{
// calculation that takes a long time
}
void processBlock (AudioBuffer<float>& buffer, MidiBuffer& midiMessages)
{
if(updateReady.load() == false) return;
// normal processing here
}
std::atomic<bool> updateReady;
};
Just a short tip: by using a ThreadPool you spread the work around all the cores. So it finishes much quicker than using a single thread.
My computer, for example, has ten physical cores, and with HyperThreading enabled (which is the default and sensible), 20 of your expensive calculations would happen simultaneously. It would finish around 20x faster than your current approach.
Since you’re talking about 88 threads, I assume you have some virtual piano or organ with 88 keys?
In this example you also need to be sure to lock the callback, when you set the updateReady flag to false, to ensure that there is no processBlock processing while the update process begins
… or, to smooth out the feel for the user … measure the medium time it takes to complete a single render, set a timer/counter, and ignore any further parameter changes within that window. This way, the parameter-change will still feel consistent to the user …