I have been working on a mixer for live use. It is open ended for the number channels and number of mixes. Each channel strip is open ended for the number of eq filters that can be applied. It also has vst plugin capability. So I am trying to get it to work on more that one core.
I have an example setupf for testing, with 32 channels, 6 filters, and 10 stereo mixes. My hardware is soniccore scope running 44.1k sample rate and 3 ms latency.
Running this configuring in a single thread(Compiled in debug mode on windows), I see cpu load of about 50%. This is measured using the preformance counters and measuring time used, vs time between asio callback.
But I am testing on a 4 core cpu and it will be used on a 6 core sytem later, so I decided to split the work between threads. This has been an interesting exercise that I have not comlpetely solved yet.
What I did was create cpu-1 threads instead of one(so with 4 cores I get 2 extra threads). The asio callback thread is a controlling thread. It sets a ThreadLock that the other 2 treads block on. When the asio thread calls back, it releases the lock and the 2 waiting threads wakeup. I use atomic increment and decrements for the treads to control the tasks. If the asio callback thread finishes first, it must wait for the other threads. I am using a spin lock for this. When all tasks are complete, the 2 extra threads block on the mutex again and the asio thread returns. (there some intermedia syncroniztion involved too).
This actually works quit well. I see the load drop to about 20% which is about what I expected. However, I get occasional drop outs where the single thread operation is rock solid.
I think my problem is the threads may not be assigned on different cpus based on current system demand. For multithreading to be really usefull I think the threads must be on different cores or all bets are off as to whether this will work well. Everything I read about thread affinity is "don't do it". Also, not sure how much latency is involved with thread locks. I asume they are built on standard windows messages.
I did not use Juce thread classes(I was lazy and used standard windows stuff I was familiar with)
Here is the quesiton. If you have many tasks, and multple threads, and the taskes must be done in groups sequenciall(so all threads will work on channel processing, then wait for each other, then all do mixes and wait, then do vsts), how should this be done? Any suggestions?
I know this is a long post, but not sure how much info I need to get the question asked properly.