Optimisations for RenderingOpsSequenceCalculator

Oh, for release you may want to change line 380 to:

             MapNode* foundMapNode;

             foundMapNode = findMapNode (node->nodeId, index);

            jassert (foundMapNode == nullptr);  //cannot have duplicate nodeIds.

Otherwise in Release the jassert line doesn’t exist and we get a warning that foundMapNode is an unused variable.

Cheers,

Rail

Thanks to Rail for finding a nasty bug with the ChannelBufferInfo's being overwritten which would only show itself in certain environments. 

This latest version fixes that and includes a bit of clean up.  jrlanglois, the bug is very similar to what you were seeing so this may fix that for you.

 

I’m going to give this a bump… since y’all are working on JUCE 4.0 – I can vouch that the changes in this are working great here with a really, really complicated graph. You may want to consider adding these changes to JUCE version 4.0

Cheers,

Rail

I thought I'd mention that I rewrote AudioProcessorGraph a couple of weeks ago, mainly to speed up graph building but to add a few other things I wanted.  Unfortunately this is for a commercial product so I can't make it available, but you reminded me that I should write down a few of the things I found before I forget it completely.

 

Graph build performance

With the contrived test case I was using (10,000's of nodes and connections), the graph build was well in excess of a minute.  I managed to get the same test case down to around 15ms, with a more realistic graph (for me anyway, a few 100 nodes/connections) well under 1ms.  I could have gone even further but there was no point once I was sub 1ms.

I see that a few people had been there before and fixed up some stuff but it really needed a few fundamental changes.  One bottleneck was the addition of ordered nodes at the start which was O(n^2).  This can be made O(num_nodes + num_connections) with a topological sort - I used Tarjan's algorithm but there are others.

Next main bottleneck was isBufferNeededLater() - probably worse than O(n^2).  I solved this by making the render op calculator work out how many times a buffer will be needed in future at the point it allocates it - essentially, setting up a refcount.  Every time the buffer is 'used' (ie. fed into the input of a later node), the refcount is decremented.  Once it reaches zero that buffer's put back on the free list again.

I also found it very useful to keep the nodes/connections in a proper graph (ie. the nodes keep track of their own incoming and outgoing connections), since the current implementation just keeps them in 'global' linear lists.  I originally just used binary chop/hashmaps on these global lists, but found it very useful to be able to just get the connections straight from the node (still kept the 'global' lists too though).

 

Run time performance

The original ordered node sort seemed to produce more of a breadth first node sort.  By this, I mean that if you have a node summing point with multiple inputs (eg. 32 mixing desk strips being mixed to 1 final output), it would tend to generate each input in turn and then only sum them at the end, requiring 32 intermediate buffers.  I changed this so it would process 1, mix 1, process 1, mix 1 etc. so that it doesn't require nearly as many buffers.  I suspect buffer cache hits might be better too but that's half speculation.

A second problem was in a similar area, but to do with latency compensation.  A common scenario I had was to have 32 channels to sum, only one or two of which had effects with non-zero latency.  The original implementation would add 30 delays to the 30 or so other channels which didn't have any latency.  I changed this so that it would mix these 30 together, then apply one delay to that submix.  I made this more general so that it would cascade summing/delay points.  eg 5 inputs with the latencies of 0ms, 0ms, 5ms, 5ms, 25ms now does this:

Sum the 0ms inputs

Apply 5ms delay

Sum with the 5ms inputs

Apply 20ms delay

Sum with the 25ms input

Feed to next node's input

 

Improvements

Multiple MIDI connections.  My version of the AudioProcess handles multiple MIDI buffers, which I find useful from time to time.  This actually made the render op calculation code simpler, because they could be treated almost identically to the audio buffers.

MIDI connection break.  I made my MIDI connections keep track of the MIDI keyboard state.  When a connection is removed, it 'injects' the necessary note off messages to the downstream nodes to automatically prevent hanging notes.

Latency delay re-allocation.  I didn't like the way that every time the graph was rebuilt, it re-allocated and cleared out all the latency delay compensation buffers, resulting in audio nastiness for even trivial changes.  I modified the delay buffer allocation so that each buffer has a unique fingerprint based upon the node, input channel index etc.  When it rebuilds the graph, it tries to look up a delay buffer with the same fingerprint and continues using it if it can.

 

There was loads more, but this is what I remember at the moment.  I'd recommend that the library version gets a full rewrite at some point - I was excited to find it initially, but that kind of turned to disappointment when I realised how slowly the graph rebuild was working on a phone, and how much logic I'd have to untangle to understand/rewrite it!

Thanks guys - we'll certainly look at this when we get chance, quite possibly before V4, but no promises. Anything that you want to share to stop us re-inventing the wheel would be appreciated!

Sweet.  I forgot to mention feedback loops, which I've only had a chance to half think about at the moment and I don't know if you'd be interested in supporting.  I wanted to allow for them since although my current application is for DAW routing (where you could argue you never want them) I can see they might be useful in a future FX application.  

It seems it's quite easy to detect which connections are involved in feedback loops (in graph terms you can find the sets from the 'strong connections').  It's also fairly easy to solve them (by breaking the connection and inserting a one block delay).  The tricky bit is how to efficiently work out the minimum set of connections which need to be broken (and have delays added) to make the graph acyclic, as this is NP-complete.

In the scenario where you're adding a connection, you know that if the graph didn't have feedback before but it does after adding the connection, then it's the connection you just added which is causing the problem.  However, when you remove an arbitrary connection (which have or may not have been a 'feedback connection') I can't currently see any alternative other than to do this:

1. Find all the feedback loops in the graph, assuming no cyclic connections have been 'broken'

2. If there are no feedback loops, we're done

3. If not, choose one candidate connection for removal using some heuristic and 'break' it

4. Goto 1

 

Anyway I'll leave all that in your capable hands - I'm sure you'll do a good job as usual :)