Chaining AudioProcessorGraphs

It must be due to differences in our graph designs then.  All your time is in isAnInputToRecursive (as you mentioned before) where as all my time was in getConnectionBetween.  Hard to say then.  It will be interesting to see if you can get some benefits from the std::vector.  That said if all your time is in isAnInputToRecursive then it's looping over Entries in the lookup table, not Connections, so you may want to try it on those first.  (perhaps thats what you meant)

My test app simply adds new instances of same test processor class to the graph sequentially….

void MainComponent::addGraphNode()
{
    int iIndex = m_pTestProcManager->addProcessor();
    
    CTestProcessorItem* pCurrentItem = m_pTestProcManager->getProcessorAtIndex (iIndex);
    
    if (m_pTestProcManager->getNumberOfProcessors() > 1 && iIndex > 0)
        {
        CTestProcessorItem* pPreviousItem = m_pTestProcManager->getProcessorAtIndex (iIndex - 1);
        
        addStereoConnection(m_pGraph, pPreviousItem->getNode(), pCurrentItem->getNode());
        }
}

m_pTestProcManager->addProcessor();

takes care of creating a new test processor object and it’s node.

The test above ran this method 1,000 times.

Cheers,

Rail

Oh well that's a huge number of inputs which explains why that recursive input method is going so crazy.  I was testing 100 audio tracks with 2 aux tracks and a master but each chain of nodes in my graph might have about 10 or so nodes in them.

Right, the test is currently adding 1,000 connections… but the issue is that the response is really slow when the number approaches 250.

Each of my Mixer’s tracks has the potential of having around 200 connections with about 30 nodes. The actual number of connections and nodes changes dynamically. Once the connections are made the graph works perfectly – the issue is the connection time.

Cheers,

Rail

Making ConnectionLookupTable::entries a vector didn’t help much… but I have narrowed the bottleneck down to ConnectionLookupTable::findEntry()… which is using 80+ percent of the CPU time.

Rail

My understanding of the vector stuff is that you'd need to store the entries directly in the vector which I think would mean the Entries would have to be copyable which they currently aren't.  Mind you, although I understand the concepts well enough I get a bit confused with the non copyable stuff sometimes.  But if the Entries are stored as pointers then I don't believe it will provide the gains.  I'll have to read up some more on it.  I think you could probably achieve similar results using the Juce Array class and pre allocating the storage.

That aside, findEntry should be quite efficient already and the fact that it's taking so much time could be more for the fact that it's being called so many times.  I'm leaning towards thinking there may need to be a better means of keeping track of which nodes are inputs to other nodes, as that's all that whole thing is trying to do.  Perhaps in the graph's addConnection there could be a means to associate connections with nodes so that a node could quickly go up or down it's connection paths.  Or perhaps nodes could be ordered as connections are added instead of waiting until buildRenderingSequence is called.  I wonder if there would be some efficiencies with that given perhaps the likelyhood that a person would be connecting their nodes in some logical order.  hmm...  

Unfortunately I won't have much time to look at it today but I'll be sure to post if I come up with something.

 

 

Hi Graeme,

Yes, I had to make Entry copyable and not save as pointers in the std::vector… The code is working, I just wasn’t seeing any gain. The binary search should be fast, but each recursion is still spending over 80 percent of the time in there…

I also tried a linear search BTW

I’ll do some more testing and I agree that perhaps the whole concept needs a revisit.

Would love to hear a comment from Jules regarding this BTW.

Thanks,

Rail

Okay, I had some time today to get back to this and since I’m adding the nodes in sequential order I can remove the sorting in buildRenderingSequence()

void AudioProcessorGraph::buildRenderingSequence()
{
    Array<void*> newRenderingOps;
    int numRenderingBuffersNeeded = 2;
    int numMidiBuffersNeeded = 1;

    {
        MessageManagerLock mml;

        Array<void*> orderedNodes;
    
        const GraphRenderingOps::ConnectionLookupTable table (connections);

        {
            for (int i = 0; i < nodes.size(); ++i)
            {
                Node* const node = nodes.getUnchecked(i);

                node->prepare (getSampleRate(), getBlockSize(), this);
            
/*
                int j = 0;
                for (; j < orderedNodes.size(); ++j)
                    if (table.isAnInputTo (node->nodeId, ((Node*) orderedNodes.getUnchecked(j))->nodeId))
                      break;

                orderedNodes.insert (j, node);
 */
            
                orderedNodes.add (node);
            }
        }

        GraphRenderingOps::RenderingOpSequenceCalculator calculator (*this, table, orderedNodes, newRenderingOps);

        numRenderingBuffersNeeded = calculator.getNumBuffersNeeded();
        numMidiBuffersNeeded = calculator.getNumMidiBuffersNeeded();
    }

    {
        // swap over to the new rendering sequence..
        const ScopedLock sl (getCallbackLock());

        renderingBuffers.setSize (numRenderingBuffersNeeded, getBlockSize());
        renderingBuffers.clear();

        for (int i = midiBuffers.size(); --i >= 0;)
            midiBuffers.getUnchecked(i)->clear();

        while (midiBuffers.size() < numMidiBuffersNeeded)
            midiBuffers.add (new MidiBuffer());

        renderingOps.swapWith (newRenderingOps);
    }

    // delete the old ones..
    deleteRenderOpArray (newRenderingOps);
}

and once again this confirms that the bottleneck is in findEntry() – this is profiled adding and connecting 6,000 nodes with stereo connections:

Jules agrees in the older thread that this needs a better method (for node traversal)… so I guess it’ll be up to us to figure one out…

Cheers,

Rail

Hi Rail, I did some more work on this and ended up essentially in the same place.  I had reworked a lot of code to link up nodes with their source and destination nodes and the performance is much better than the stock graph, but my method which ended up being the equivelant to findEntry still ended up as the bottleneck.  Indeed though it was due to it being called ~450 million times (or something like that) with only ~2300 nodes in the graph.

I'm now looking into an approach that gathers up all nodes which have no destination connections (should account for all end points in the graph) and then work backwards to build "Node Path" objects.  As the graph is traversed each time a node is visited it is given a reference to whatever the current path being created is (nodes can be in multiple paths due to branches in the flow).  The idea then would be that we could quickly tell when a node is an input to another by checking if they both contain a reference to a shared path object and then checking the index of each in that path.

I have numerous other priorities at the moment but hopefully I'll have an update for this approach within 2 or 3 days.

Hi Graeme,

Thanks!

I also have some other irons in the fire… but this is a high priority issue for me… since my whole PI relies on my getting this to work or needs a major redesign… which would severely limit functionality…

I look forward to seeing what you come up with – I’ll be trying a few different ideas over here too.

Cheers,

Rail

Hi Rail.  I've attached a modified version of the stock juce graph if you'd like to give it a test. (Note I had to append ".txt" to be able to attach it)  This code includes the graph map that I was talking about

Note:
-This is proof of concept and the implementation should be cleaned up.  I plan to straighten it up but there are some aspects with the buffer reuse code that I haven't been able to get a mental picture of so I just wedged the new code into the existing routines.  Once I get that figured out I'll rework things.  Note that I don't really have a timeline for that, perhaps in a week or so.

-Your test with 6000 sequential nodes connected in stereo takes 125ms on my machine.  That's running on a desktop with an i7-3770k processor.  I don't recall the numbers for my standard graph layout but it was a substantial increase in performance to the point where it was difficult to get the graph code to show up in the profiler.

-I'm thinking there could be further performance improvements that would take advantage of the map but it's hard to justify putting much more effort into it.  The test numbers are well beyond reasonable usage expectations for my application.

Anyway, give it a run and let me know how it goes!

 

 

 

 


 

Btw, I left a timing check in buildRenderingSequence.  You may wish to remove that.

Hi Graeme,

That is dramatically faster!!

Here’s a screen capture of the profiling:

And here is a copy of the profile trace output.

GraemeCodeChange.trace.zip

Every blue flag indicates where I add 1,000 nodes and make stereo connections between them.

I think it would be good for you to have a copy of my test app, so PM me your contact info.

I haven’t had time to check out your code changes yet – will do so after I get going today…

Cheers,

Rail

While all the connections seem to be being made if I do a graph dump, in my PI if I use your changes, the graph doesn’t render… I will do more tests later when I get some time.

Thanks,

Rail

Sorry about that Rail.  It was just a copy/paste bug due to a difference in my graph vs the juce graph.  Specifically the instantiation of the calculator should have had "newRenderingOps" passed in instead of "renderingOps".  I verified the bug and fix in the Plugin Host Demo and attached the updated version.

 

Better!! All the processBlocks of the processors in the graph are working… the only issue I have is the synth proc isn’t playing (it is receiving the MIDI note though)…

I’ve sent you an invite to the test app repos and will let you know if I can locate the the synth proc. issue.

Thanks,

Rail

hmmm..Midi seems to be working fine with the Plugin Host Demo.  Any chance it's something on your end?  Unfortunately I'm terribly busy with other things so it may take a few days before I can take a good look at it.

Yeah the JUCE demo does indeed work in the host demo… but my PI works fine with the origin. JUCE AudioProcessorGraph code, but when I replace it with yours the synth doesn’t play… I’ve been busy with some other projects but will try and get back to this next week. I rechecked my code and it all looks good… the graph is pretty involved though.

Thanks,

Rail

Okay,

I’ve traced it to the graph thinking there’s a feedback loop when there isn’t… so it stops rendering… so there’s something wrong in the new code where it’s checking for feedback loops… will keep checking…

Cheers,

Rail

I've been able to produce what perhaps is a problematic scheme in the Plugin Host.  In this scheme it occurs when 2 midi nodes are connection to one subsequent node. (I've attached an image) It works with stock graph and not in the modified and it thinks there's a feedback loop happening.    I'll poke around a bit today but should have more time tomorrow if need be.