Some context:
I am working on a synth/sequencer plugin which primarily uses a CameraDevice to control its parameters, by letting the user define a region within the image to use for calculations. Thus a lot of this calculations are expensive matrix calculations, such as summing all values in a matrix and calculating its average, or summing the value of each row of the matrix to calculate the average value of each row in order to extrapolate new data, such as writing data to a wavetable or generating MIDI events.
Now obviously, performing these calculations once every processBlock()-call is extremely wasteful, since the CameraDevice typically has a frame rate that is significantly lower than audio rate (or even the buffer rate in most cases), but since I’m not sure what other way is the best, this is how I’ve implemented it so far. It seemingly works well in the Release build, but I suspect that it’s not a very scalable approach. Debug build suffers quite a bit from buffer underruns due to spending too much time on those expensive matrix calculations.
Ideally, the matrix calculations would only be performed once for each frame, or whenever the user-defined region has been moved or resized, to avoid wasting performance on redudant calculations.
What would be the best way to increase the performance and avoid underruns in this situation?
I imagine that the best approach would be some form of parallellism where the audio thread and the “video thread” run in parallell, but the video thread could provide new data to the audio thread or something, though I’m not sure if this would only introduce even more issues.
Other threads that have tackled a similar issue of dealing with expensive parameter calculations have suggested throttling the parameter input, though this is a lot harder since I actually need to perform the matrix calculations in order to know which parameters have been changed in the first place, if that makes sense. Other suggestions I’ve seen include using Thread, ThreadPool, TimeSliceThread, AbstractFifo, SIMDRegister. I just find it hard to understand how to apply these solutions to my specific case, since there is seemingly a lot of nuance here.
I have never used CameraDevice before. AFAIK, Thread (or ThreadPool) might be the right choice. It may work as follows (very similar to FFT analyzer):
when the user connect/disconnect the camera, you start/end the thread
notify the thread when a new camera image frame generates
use a atomic bool isCalculationReady to check whether the matrix calculation is completed (reset it to false when the result is consumed)
and the run function may look like follows:
while (!threadShouldExit()) {
if (isCalculationReady.load()) {
continue;
}
// matrix calculation
isCalculationReady.store(true);
const auto flag = wait(-1);
juce::ignoreUnused(flag);
}
Getting those resources onto (and off of) the audio thread is going to be a challenge. Using atomic flags as zslui98 suggested is possible, but in my experience it introduces lots of nuances and edge cases that you didn’t consider and that make the code not really thread safe.
If it were me, I would use Thread or ThreadPool to create a std::shared_ptr around your matrix resources. (If possible, make it std::shared_ptr<const Resources> for extra safety). Then, I would push that shared_ptr onto a lock free fifo, where it can be picked up on the audio thread.
This only half solves the problem though, as the audio thread will be deleting Resources when shared_ptr goes out of scope. To solve this, you would probably need to put the shared_ptr into a second lock free fifo, which can then be regularly checked and emptied by some other Thread. This ensures that Resource is never deallocated on the audio thread.
This might seem a bit convoluted, but it has a strong likelihood of working robustly where other methods will introduce many edge cases. Others on this forum may have better ideas though.
You’d also have to think carefully about the design of those two fifo buffers. For the first one (putting resources onto the audio thread), you’d probably want a way of accessing the last Resource if there is no new Resource on the buffer. For the second one (taking resources off of the audio thread), you’d need to make sure that it doesn’t overflow with frequent calls.
I would definitely not use std::shared_ptr for this! Anyshared_ptr object may deallocate (if it’s the last one to be destroyed), so if we know that we never want the audio thread to deallocate, then just… don’t introduce the possibility by giving it a shared_ptr.
Interesting point. I was suggesting shared_ptr because you can guarantee that it won’t delete a resource on one thread while it’s being accessed by another, which is a real risk otherwise. If you put an assertion in the destructor to check if it’s being deleted on the audio thread, you can pretty quickly figure out if you’re deallocating in the wrong place. But I’m interested to hear other methods–I know that others here have a lot more experience than I do.
By definition, your audio is separated from video frames, these are completely two unsynchronised processes. So - as already indicated - I would split your incoming audio data and push it into FIFO to be processed in another thread/s. Then, a next thing is to get your data when needed. It should be done by your audio thread, you should check if some data is already ready and read it from another FIFO managed this time by the matrix calculation thread. I wrote such software this way - it was an eyetracker working similarly to what you described.
If you need some sophisticated knowledge about parallel processing, I strongly recommend this book: Anthony Williams “C++ Concurrency In Action”.
But anyway, this kind of processing should be done as simple as possible, overthinking can lead to some solutions which look elegant, but have some hidden bugs really hard to find and debug.
Thanks a lot everyone! I ended up getting it to work (or at least seemingly so, heh) with your help. I can already say that with the implementation I have currently, I’ve already gotten rid of all the buffer underrun issues I had earlier.
I ended up relying mostly on zsliu98’s suggestion of using an atomic bool as a flag for wether the result has been consumed or not, since that approach seemed to make more sense than using an AbstractFifo.
To be more specific about how I tied it together:
When the program starts, the video thread object is initialized with a reference to a juce::Image object that always contains the last received video frame of the juce::CameraDevice.
Video thread is started when the user selects which juce::CameraDevice to open and receive frames from.
The imageReceived() callback of the juce::CameraDevice wakes the video thread every time it’s called.
In the video thread’s run() function, it first checks if the referenced juce::Image is valid by using the isValid() function. If it returns true then it checks if the isCalculationReady flag is false before performing calculations using the bitmap data.
The calculation routine stores all of its results into what are essentially “duplicate” versions of the data that the audio thread wants to access, and sets isCalculationReady to true when it is done. (For example: The audio thread wants to access a vector called “averagedValues” which contains some precalculated data. The video thread writes to a separate vector called “queuedAveragedValues”.)
Next time an audio buffer is received, the audio thread sees that the isCalculationReady flag is true, and consumes the results by either copying or moving from the “queued” variables to its own “live” variables. It then sets the flag to false, which lets the video thread know that all the results have been consumed and it is safe to write more data to the queue.
I actually also have a separate flag called isDataValid that the audio thread now sets to true. When that flag is set to false, the audio thread will refrain from doing anything with its own “live” variables yet, due to the risk of indexing out of bounds on uninitialized vectors or reading other garbage data. Once a single calculation routine has been done we can be sure that the data the audio thread is trying to read will stay valid for the rest of its lifetime.
Hopefully this made sense, and perhaps someone else will find this useful. It turns out that this was a pretty easy thing to fix, but it was nice to have someone at least guide me in the right direction before I started diving head first into this.