Audio Processor Graph Bottleneck

Hey! – I use a forked version of the JUCE APG but I like to give a shot to the current one when I see an update added to it. – The most recent refactor for the APG looks great imo, much clearer API and sync vs async management, but still has some really high CPU usage for my use cases which means I need to remain on my fork.

One way I use the APG is to create many inputs for nodes handling modulation and various control signals which I convert and run as audio which can be dynamically routed around – this means for some APG nodes, I’ll have upwards of many thousands of inputs depending on the number of parameters and modulators possible for the node. – the recent refactor doesn’t handle this use case well due to this code:

You can see the majority of the time spent in the graph, is actually figuring out the buffer to send into the node, rather than calling process on the node itself.

Not expecting a quick patch to this, but just wanted to share – I haven’t looked too much into the changes around this and how the audioChannelsToUse comes into the context, but I suspect there is a way to create this structure during the render sequence stage, instead of in the realtime block, since these buffer I believe shouldn’t change after sequence construction.

J

Isn’t the belief that buffer sizes shouldn’t change a bit mis-guided, though? Your users’ performance is going to be quite variant between instances - the dynamic buffer assessment is there to make sure you don’t suffer dropouts during whatever else is occurring in the environment you are executing in …

Hmm not sure what you mean – there’s a preallocated pool of buffers to be used in the graph, the final buffer which is used from this pool is calculated each process block for each process context, but I don’t believe that these indexes changing after the render sequence is constructed – the block size is still dynamic.

Please could you try profiling this patch and check whether it improves things for you? It seems to be a very minor improvement for me, but maybe my test case scales differently than your program.

graph.patch (12.9 KB)

Thanks Reuk! Yep – I’ll give it a shot.

I think the easiest way to replicate the use case on our end would be to add a node with a few thousand inputs to the unit tests, and include some time reports. I’ll report back here when I’m able to give this patch a shot.

Update:

Nice work! That totally solved it: brought the CPU way way down:

Thanks so much for looking at that so fast! – there’s another bottleneck in the graph creation, I’ll find that one and post it here as well.

So the other bottle neck appears to be here – CPU jumps to 100% and the UI hangs while the sequence processes:

(it shows 50% here – the other 50% is doing the double copy of the graph)

I think the issue in this spot in the graph is doing a linear lookup to find connections for the node – this one has always existed a problem in the graph. In our fork we added a connection cache into the node classes so the node has a handle on all its relevant connection removing this lookup during the builder stage.

Some other possible improvements:

  • Don’t build the Double sequence if the user has overridden supportDoublePrecision to return false. (free 50% reduction)

  • When a connection is added or removed, add an options to not rebuild the graph, so when a lot of edits are being made to the graph, it doesn’t constantly rebuild, and the user can manually trigger “rebuild graph” when edits are done being made.

^ I think doing these two things significantly reduces the time to create the render sequencer, we debated building on a BG thread instead of the message thread to not cause hangs in the UI, but after these changes, the graph went from taking ~5 seconds to build to being instant.

Let me know if there’s anything we can do to help! Would love to help get the graph more scalable and see if some of these patches could improve JUCE.

The channel-caching change is added here:

I’ll try to take a look at your other ideas when I get a chance, as they sound sensible.

Lots of the graph modifier functions now take an UpdateKind parameter which may be either sync or async. The graph will be rebuilt immediately after a sync change, and will be deferred after an async change. It should be possible to avoid needless rebuilds by making all but the final change async.

Awesome! I’m happy to keep testing with a larger project – will give this one a shot when I have a chance this week.

This is kind of true. It may, or may not, trigger extra graph generations depending on when that aync callback triggers. In our app they’re not cheap, so we need explicit control over when the graph constructs or it needs to be moved off of the message thread.

What about UpdateKind::None – and another method triggerUpdate(UpdateKind)

Defaults to async for most general use cases, but when needed the user has the option to do sync, as well as none. Seems really nice, will work for 95% of people, while allowing the 5% to also have granular control.

There are some more improvements to the AudioProcessorGraph on develop.

We’ve added the ability to avoid rebuilding the graph when making modifications, and to trigger a rebuild manually:

We only build a graph for the necessary bit-depth. This uses the setProcessingPrecision API from AudioProcessor. If you want to process the graph in double precision, then you must call setProcessingPrecision before prepareToPlay.

Finally, we’ve tried to speed up the search for empty channels:

1 Like

Hey Reuk!

Pleased to report that I finally got a chance to try this, and upon quick review this has solved everything massively. It is a night and day difference, from unusable to completely smooth.

I’m going to spend a bit more time diagnosing and refactoring our code to the new API. If all goes well I plan to throw my fork into the trash when this makes its way into master.

Thanks so much!!

2 Likes

That’s great, glad it’s working well for you!

1 Like