I have been trying to improve performance by rendering my audio processors using multiple threads but I’m running in to some issues that I hope someone with more experience can help me figure out.
Here is a simplified explanation of the setup:
I have, lets say between 4 and 64 AudioProcessors that need to have their ProcessBlock called.
When not using my “multi thread setup” I would just loop these in the AudioDevice callback and process each of them in turn.
Then I was thinking, what if I create threads (as many as cpu cores) divide the processors between them, and make them run in parallel.
The way I set it up is : “jobs” are pre-sorted to avoid conflicts/waiting, each thread loops through it’s jobs, and then goes in to wait(x)
The main call back loops all the threads and polls If they are done, and once all are done it moves on.
On the next call back it calls notify() on the threads to make them wake up again, and re run their jobs.
No critical sections are involved (from my side), just some atomics to get “job count” and “done count”
It is working fine, but when pushing down the buffer size, drop outs occur much sooner then when just running them all in a single loop in the main call back. (there is a big improvement at higher buffer sizes, so there is still something to this)
Are there any performance issues with wait() / notify()?
I see a lot of implementations will not use notify but rather just wait(1) or wait(3) and then just keep the run loop going, but that won’t work as 1 or 3 milliseconds is waaay to long to be sleeping if the buffer size is 16 / 32 / 64 etc.
Are there any other genius techniques I don’t know about?
I have used the PerformanceCounter at various places to see where the time is spent. And it will say something like:
Average = 90 microsecs, minimum = 31 microsecs, maximum = 4804 microsecs, total = 90
I guess it is the few times it gets to 4804 I get dropouts.
Tried also to run with only one thread in my “pool” and that seems to work just as well as just doing it in the main call back. more threads = more drop outs.
Threads are running with real-time priority of course.
Same issue on OSX and Windows
Can’t get much info out of the profilers either as these are just short spikes when the threads take too long, not really a cpu use issue.
Any ideas are VERY welcome.