Multicore Support

ans · May 17, 2011, 8:05am

Hi Jules,

your AudioProcessorGraph is really a great tool. Looking at some 20-50 or more plugins being hosted, I wonder if it would make sense to distribute the render work of a graph across multiple CPU cores.

As it is implemented now, the entire chain of render ops is executed on a single thread. For a big graph with many plugins, this could get tight, i.e. the whole chain might not complete within the render block time slice.

In theory, the graph could be splitted into parallel chains, each of which is rendered on a separate thread (thread pool) running on a different CPU core. The final result is then merged from the output buffers of each thread.

As straightforward as this sounds, it is likely a lot more complex. I would expect the hardest part to be the intelligent splitting of the graph, but that could be done manually (programmatically) to some extent. In a typical mixer metaphor, one could split the graph roughly based on channel strips, for example.

Questions:

Are sub-graphs the best way to implement this, or are there performance penalties that suggest it could be better to implement a specialized graph class?
How would I make sure that threads in a pool are equally distributed across CPU cores?
Am I completely idiotic, missing the most basic things and dreaming of impossible things?

Any thoughts are welcome!

Andre

jules · May 17, 2011, 8:55am

It’s a complicated problem, and I’ve never had chance to think it through very carefully.

The OS will probably do a pretty good job of that, but you could play around with affinity settings if necessary.

No, it’s definitely harder than it sounds, but not impossible!

chkn · May 17, 2011, 11:21am

the simplest solution would be, run every plugin processing-routine in its own separate thread, and connect the audio streams via FiFo-Buffers together

another option is to find a algorithm which creates a plan, which respects which plugin can be run serial or parallel to other plugins, and then run the plan for every block
(somehow i like the first idea, because its so simple, but maybe it has a little overhead)

chkn · May 17, 2011, 11:24am

and of cause on knot-points, the FiFo has to wait that all samples for a given position are arrived, to proceed

ans · May 17, 2011, 1:18pm

Each AudioProcessor on its own thread is certainly simple but not optimal, because the communication and synchronisation overhead would be huge.

The graph is a pull model. It should not be too difficult to come up with an algorithm that divides a graph into subgraphs that may run in parallel. I have a mixer metaphor in mind, so my thinking might be too simple, but it is definitely a solvable problem. One might run into issues with a lot of side chains and such.

It boils down to “find the maximum number of distinct paths (partial, the longer the better) backwards from the output that do not meet”. Render everything else, starting from the graph input, on the main render thread first, then fan out to the other threads and wait at the output for all results to arrive - done.

Of course, this may result in a network that costs more performance than a straight render sequence, so there must also be a validation function that estimates the total cost of each suggested solution. If there is any solution that is better than the straight render sequence, it is taken.

Pretty straightforward generate-and-test decision making algorithm.

He he, sounds tempting, doesn’t it? :twisted:

Ah, btw: The audio buffers are so small (512 samples, typically), they can be passed between threads in one piece. I would not use sample queues.

chkn · May 17, 2011, 3:02pm

yes, or just switching the pointer, but not if you do something like sample-exact automation

Another thing, if we use one thread per plugin, also serial plug-in chains would benefit from multi core processing

andrewj · May 17, 2011, 10:25pm

It might be instructive to consider how Reaper implements anticipatory FX processing. I haven’t been able to track down much info on it other than this quote from SoS magazine:

Dri · May 19, 2011, 2:48pm

[quote=“ans”]Each AudioProcessor on its own thread is certainly simple but not optimal, because the communication and synchronisation overhead would be huge.
[/quote]

I can confirm that too many threads would be counter productive. Pool approach is the best approach. Especially if you fine tune the number of simultaneous threads according to the hardware architecture. Best being of course 1 thread per core and keeping 1 core available for synchronisation work.
Thread affinity is the most efficient if you have a NUMA (http://en.wikipedia.org/wiki/Non-Uniform_Memory_Access) compliant machine. I think OSes like Solaris support that by default. Linux needs libnuma. I don’t know about the other systems but seeing OSX’ Grand Central Dispatch, I’d say you better let the OS decide.

I like this idea.
What if the paths meet? Would you process everything on the main thread, then only split once you passed the meeting node?

What if graph topology changes during playback? I guess the heuristic computation shouldn’t be triggered in such cases?

And if different threads work on different buffers, we wouldn’t even need any locking, would we?

I like such crazy ideas! :twisted:

dinaiz · June 26, 2011, 5:56pm

That would be for sure a must-have for everybody developping hosts (as opposed as developping plugins).
I suppose Jules doesn’t have time at all to design something like this, but there are quite a few host developpers on this forum. What if we tried to collaborate together, under the supervision of our fearless leader (just to make sure we aren’t changing code he is also changing) and try to come up with a working solution ?

We could try to design, code and test a MultithreadedAudioProcessorGraph and submit it to Jules for inclusion in Juce( maybe trying to address other issues in the AudioProcessorGraph, for example -correct me if I’m wrong - the fact that it doesn’t take in account plugins latencies ?

We would have an AudioProcessorGraph on seteoroids then, multi-core ready and that’d be a F**** good audio engine !
What do you guys think ?

I totally agree with you about the pool approach, but I’m not sure about the number of threads in the pool. There is the high priority audio thread, but also the message thread, so that leaves us numberOfCores-2 threads for plugin processing. Starting from 4 cores, that would be interesting already !

masshacker · June 26, 2011, 7:09pm

[quote]We would have an AudioProcessorGraph on seteoroids then, multi-core ready and that’d be a F**** good audio engine !
What do you guys think ?[/quote]

I think it’s a great idea!

dinaiz · June 27, 2011, 12:33am

The main problem I see is that, if we want it to be integrated in Juce someday, we can’t use Boost or TBB or any 3rd party lib …
It should be doable in 100% juce though. Who’s in ? (The vinn : hint )

ans · June 27, 2011, 7:38am

Unfortunately my time is very limited, but I’d love to participate in the discussion for sure. Basically the problem is dividable into three areas:

Dividing a graph into parallel subgraphs, both algorithmic and manual. IMO it is very important to offer a manual API, if only for testing.
An efficient mechanism for passing buffers between threads (lock-free, allocation-free, whatnot…)
Ensuring that all threads really stay separate and do not get in their way somehow by using shared data that spoils the parallelism.

I would also not expect too much a performance gain except for very large graphs and 4 or more CPUs, but I may be totally wrong. My 8-Core Mac would certainly be very happy at least

In the short term, I’d however be more interested in delay compensation. Timing issues are more obvious, because they occur with a minimal number of plugins already can be quite confusing for the end user.

dinaiz · June 29, 2011, 4:21pm

Maybe this thread should be merged with that one : http://www.rawmaterialsoftware.com/viewtopic.php?f=2&t=7344&p=41380&hilit=multicore#p41380 and (the end of) that one : http://www.rawmaterialsoftware.com/viewtopic.php?f=2&t=7020&hilit=PDC&start=75 ?

Apparently, plugin delay compensation is already implemented actually !

It seems that nobody has really time on his hands to deal with this at the moment.

I really suck at multithreading, but if someone can provide a simple design, I’ll be happy to write the code (the stupid part, where I’m good at )

masshacker · June 29, 2011, 5:44pm

+1. Sorry guys, but I’m a musician and I’m facing multithreading for the first time just now :roll:

ans · June 30, 2011, 7:47am

Wonderful. I didn’t notice until yesterday. That’s a great achievement.

dinaiz · July 6, 2011, 9:56am

Have you guys seen that ? http://www.boost.org/doc/libs/1_41_0/libs/graph_parallel/doc/html/index.html

It would probably be possible to build an MultithreadedAudioProcessorGraph on the top of that. I suppose the dependancy on boost makes it not suitable for inclusion in Juce, but we could add it to “Useful Tools and Components”. What do you think ?

ans · July 6, 2011, 8:38pm

I doubt that a general graph library like this would be very helpful here. Porting a generic graph algorithm to Juce is also not a problem. The real problem is to know exactly which portions of an audio graph make sense to be separated from each other. A collection of general graph tools can not answer this question. Only we can, knowing all the details about the inner workings of an audio graph.

dinaiz · July 6, 2011, 8:45pm

You’re absolutely right, BUT, I see it the other way round : with this library, the only remaining problem is to know exactly which portions of an audio graph make sense to be separated from each other which we can do, because we know all the details about the inner workings of an audio graph.

Starting from AudioProcessorGraph, would make us have to deal with all the common mutithreading issues, and we wouldn’t be sure it always works . In boost, the work has been done and it’s probably very reliable . Well IMHO …

Topic		Replies	Views
Multi core support General JUCE discussion	18	1608	December 20, 2012
FR: Multithreading for AudioProcessorGraph Feature Requests	9	250	November 13, 2024
AudioProcessorGraphMulticore thoughts General JUCE discussion	9	756	June 8, 2012
Support for multiple cores running plugins in plugin host application? General JUCE discussion	3	290	July 5, 2023
Multiple threads in AudioProcessor Audio Plugins	2	1287	January 24, 2015

Multicore Support

Purchase

Discover

Learn

Support

About

Events

Multicore Support

Related topics

Purchase

Discover

Learn

Support

About

Events