AudioProcessorGraph slow manipulations

Well the profile doesn’t lie…getConnectionBetween is the hot spot and it’s doing a linear search. I think if you put a sort on that Connection array and do a binary search it will get his run time down to 100ms

Okay this will improve it by 800%:

Declare this struct right before getConnectionBetween()

struct ConnectionCompare
{
  static int compareElements (AudioProcessorGraph::Connection const* first, AudioProcessorGraph::Connection const* second)
  {
    if (first->sourceNodeId < second->sourceNodeId)
      return -1;
    else if (first->sourceNodeId > second->sourceNodeId)
      return 1;
    else if (first->destNodeId < second->destNodeId)
      return -1;
    else if (first->destNodeId > second->destNodeId)
      return 1;
    else if (first->sourceChannelIndex < second->sourceChannelIndex)
      return -1;
    else if (first->sourceChannelIndex > second->sourceChannelIndex)
      return 1;
    else if (first->destChannelIndex < second->destChannelIndex)
      return -1;
    else if (first->destChannelIndex > second->destChannelIndex)
      return 1;
    else
      return 0;
  }
};

Replace getConnectionBetween() with this:

const AudioProcessorGraph::Connection* AudioProcessorGraph::getConnectionBetween (const uint32 sourceNodeId,
                                                                                  const int sourceChannelIndex,
                                                                                  const uint32 destNodeId,
                                                                                  const int destChannelIndex) const
{
  ConnectionCompare comp;
  Connection c;
  c.sourceNodeId = sourceNodeId;
  c.destNodeId = destNodeId;
  c.sourceChannelIndex = sourceChannelIndex;
  c.destChannelIndex = destChannelIndex;
  int index = connections.indexOfSorted (comp, &c);
  if (index != -1)
    return connections.getUnchecked (index);
  return 0;
}

Change connections.add © in addConnection() to these lines:

    ConnectionCompare comp;
    connections.addSorted (comp, c);

Wow, this definitely does speed it up further! Awesome! I gotta run out for a bit, but I’ll run some more tests with it later. Thanks!

Nice one, thanks Vinn, I’ll see if I can incorporate that too…

indexOfSorted() is about 60% and isBufferNeededLater() is about 40% of the run-time now so further improvements are likely only possible through algorithmic changes in isBufferNeededLater as you pointed out.

I have implemented plugin delay compensation in an adapted version of CalculateRenderingSequence… let me know if you want to incorporate that too (can post code as well but it’s a bit lenghty).

Sounds interesting… I’ve checked in an updated version with some of these optimisations in it, so if you have a version based on that, I’d certainly like to see what you’ve done!

I think a generalized version of findEntry() in juce::Array or juce_ElementComparator.h would be handy

Ok, I’ll try to merge the two versions…

Okay, there were basically no changes in CalcRenderingSequence so not much merging to do…

First of all, we need a DelayChannelOp:

[code]class DelayChannelOp : public AudioGraphRenderingOp
{
public:
DelayChannelOp(const int ChannelNum_, const int Delay_) : ChannelNum(ChannelNum_), Delay(Delay_)
{
int size = Delay+5000; //safety margin for audio buffer size
DelayBuffer = new float[size];
memset(DelayBuffer,0,size*sizeof(float));
current=begin=&DelayBuffer[0];
end=&DelayBuffer[size-1];
delay=&DelayBuffer[Delay];
}
DelayChannelOp() : ChannelNum(0), Delay(0) {DelayBuffer=0;};

~DelayChannelOp() 
{
	if (DelayBuffer) delete DelayBuffer;
}

void perform (AudioSampleBuffer& sharedBufferChans, const OwnedArray <MidiBuffer>&, const int numSamples)
{
	float *in=sharedBufferChans.getSampleData(ChannelNum,0);
	int numSamples_ = numSamples;
	while (--numSamples_ >= 0)
	{
		*delay++ = *in;
		*in++ = *current++;
		if (delay>end) delay=begin;
		if (current>end) current=begin;
	}
}

private:
const int ChannelNum,Delay;
float *DelayBuffer,*begin,*end,*current,*delay;
};[/code]

Then, in CalcRenderingSequence, for each input to each node, we check which one causes the greatest amount of latency. All other inputs are delayed appropriately. Latency is passed down the rendering chain… I’ve marked all changes I did with a //CS comment.

class RenderingOpSequenceCalculator
{
public:

	RenderingOpSequenceCalculator (AudioProcessorGraph& graph_,
								   const Array<void*>& orderedNodes_,
								   Array<void*>& renderingOps)
		: graph (graph_),
		  orderedNodes (orderedNodes_)
	{
		nodeIds.add ((uint32) zeroNodeID); // first buffer is read-only zeros
		channels.add (0);

		midiNodeIds.add ((uint32) zeroNodeID);
		output_delay=0; //CS
		for (int i = 0; i < orderedNodes.size(); ++i)
		{
			createRenderingOpsForNode ((AudioProcessorGraph::Node*) orderedNodes.getUnchecked(i),
									   renderingOps, i);

			markAnyUnusedBuffersAsFree (i);
		}
		graph.setLatencySamples(output_delay); //CS
	}

	int getNumBuffersNeeded() const	 { return nodeIds.size(); }
	int getNumMidiBuffersNeeded() const	 { return midiNodeIds.size(); }

private:

	AudioProcessorGraph& graph;
	const Array<void*>& orderedNodes;
	Array <int> channels;
	Array <uint32> nodeIds, midiNodeIds;
	NamedValueSet NodeDelays; //CS
	int node_delay; //CS
	int max_delay,output_delay; //CS

	enum { freeNodeID = 0xffffffff, zeroNodeID = 0xfffffffe };

	static bool isNodeBusy (uint32 nodeID) throw()  { return nodeID != freeNodeID && nodeID != zeroNodeID; }

	void createRenderingOpsForNode (AudioProcessorGraph::Node* const node,
									Array<void*>& renderingOps,
									const int ourRenderingIndex)
	{
		const int numIns = node->getProcessor()->getNumInputChannels();
		const int numOuts = node->getProcessor()->getNumOutputChannels();
		const int totalChans = jmax (numIns, numOuts);

		Array <int> audioChannelsToUse;
		int midiBufferToUse = -1;

		max_delay=0; //CS
		float sendAmount = node->properties.getWithDefault("sendAmount",1);
		// get a list of all the inputs to this node
		Array <int> sourceNodes, sourceOutputChans;

		max_delay=0; //CS: first go over all input nodes regardless of channel to get max_delay of all inputs
		//output_delay=0;
		for (int i = graph.getNumConnections(); --i >= 0;)
		{
			const AudioProcessorGraph::Connection* const c = graph.getConnection (i);
			if (c->destNodeId == node->nodeId ) sourceNodes.add (c->sourceNodeId);
		}
		for (int i=0;i<sourceNodes.size();i++)	
		{
			node_delay = NodeDelays[String(sourceNodes.getUnchecked(i))];
			if (node_delay>max_delay) 
					max_delay = node_delay;
		}

		sourceNodes.clear();
		//CS: end of initial max_delay calculation
		for (int inputChan = 0; inputChan < numIns; ++inputChan)
		{
			/*CS: now defined outside of channel loop
			// get a list of all the inputs to this node
			Array <int> sourceNodes, sourceOutputChans;
			*/

			for (int i = graph.getNumConnections(); --i >= 0;)
			{
				const AudioProcessorGraph::Connection* const c = graph.getConnection (i);

				if (c->destNodeId == node->nodeId && c->destChannelIndex == inputChan)
				{
					sourceNodes.add (c->sourceNodeId);
					sourceOutputChans.add (c->sourceChannelIndex);
				}
			}

			int bufIndex = -1;
			

			if (sourceNodes.size() == 0)
			{
				// unconnected input channel

				if (inputChan >= numOuts)
				{
					bufIndex = getReadOnlyEmptyBuffer();
					jassert (bufIndex >= 0);
				}
				else
				{
					bufIndex = getFreeBuffer (false);
					renderingOps.add (new ClearChannelOp (bufIndex));
				}
			}
			else if (sourceNodes.size() == 1)
			{
				// channel with a straightforward single input..
				const int srcNode = sourceNodes.getUnchecked(0);
				const int srcChan = sourceOutputChans.getUnchecked(0);

				bufIndex = getBufferContaining (srcNode, srcChan);

				if (bufIndex < 0)
				{
					// if not found, this is probably a feedback loop
					bufIndex = getReadOnlyEmptyBuffer();
					jassert (bufIndex >= 0);
				}

				if (inputChan < numOuts
					 && isBufferNeededLater (ourRenderingIndex,
											 inputChan,
											 srcNode, srcChan))
				{
					// can't mess up this channel because it's needed later by another node, so we
					// need to use a copy of it..
					const int newFreeBuffer = getFreeBuffer (false);

					renderingOps.add (new CopyChannelOp (bufIndex, newFreeBuffer));

					bufIndex = newFreeBuffer;
				}
				node_delay = NodeDelays[String(srcNode)]; //CS
				
				if (node_delay<max_delay) //CS: need to add extra delay to keep everything in sync
					renderingOps.add ( new DelayChannelOp(bufIndex,max_delay-node_delay));
				
			}
			else
			{
				// channel with a mix of several inputs..

				// try to find a re-usable channel from our inputs..
				int reusableInputIndex = -1;

				for (int i = 0; i < sourceNodes.size(); ++i)
				{
					const int sourceBufIndex = getBufferContaining (sourceNodes.getUnchecked(i),
																	sourceOutputChans.getUnchecked(i));

					if (sourceBufIndex >= 0
						&& ! isBufferNeededLater (ourRenderingIndex,
												  inputChan,
												  sourceNodes.getUnchecked(i),
												  sourceOutputChans.getUnchecked(i)))
					{
						// we've found one of our input chans that can be re-used..
						reusableInputIndex = i;
						bufIndex = sourceBufIndex;
						node_delay = NodeDelays[String(sourceNodes.getUnchecked(i))]; //CS
						if (node_delay<max_delay) //CS
						{
							renderingOps.add(new DelayChannelOp
												(sourceBufIndex,max_delay-node_delay));
						}
						break;
					}
				}

				if (reusableInputIndex < 0)
				{
					// can't re-use any of our input chans, so get a new one and copy everything into it..
					bufIndex = getFreeBuffer (false);
					jassert (bufIndex != 0);

					const int srcIndex = getBufferContaining (sourceNodes.getUnchecked (0),
															  sourceOutputChans.getUnchecked (0));
					if (srcIndex < 0)
					{
						// if not found, this is probably a feedback loop
						renderingOps.add (new ClearChannelOp (bufIndex));
					}
					else
					{
						renderingOps.add (new CopyChannelOp (srcIndex, bufIndex));
					}

					reusableInputIndex = 0;
					node_delay = NodeDelays[String(sourceNodes.getUnchecked(0))]; //CS
					if (node_delay<max_delay) //CS
					{
						renderingOps.add(new DelayChannelOp
										(bufIndex,max_delay-node_delay)); 
					}
				}

				for (int j = 0; j < sourceNodes.size(); ++j)
				{
					if (j != reusableInputIndex)
					{
						const int srcIndex = getBufferContaining (sourceNodes.getUnchecked(j),
																  sourceOutputChans.getUnchecked(j));
						if (srcIndex >= 0) //CS: changed whole block
						{
							node_delay = NodeDelays[String(sourceNodes.getUnchecked(j))];
							if (node_delay<max_delay) //CS
							{
								if (! isBufferNeededLater (ourRenderingIndex,
								inputChan,
								sourceNodes.getUnchecked(j),
								sourceOutputChans.getUnchecked(j))) //CS
								{	
									renderingOps.add(new DelayChannelOp(srcIndex,         
										max_delay-node_delay)); //CS
								}
								else //CS: buffer is reused elsewhere, can't be delayed
								{	
									int buffer_to_delay = getFreeBuffer(false);
									renderingOps.add (new CopyChannelOp (srcIndex, buffer_to_delay));
									renderingOps.add(new DelayChannelOp(buffer_to_delay,         
										max_delay-node_delay)); //CS
									renderingOps.add (new AddChannelOp (buffer_to_delay, bufIndex));
								}
							}
							else renderingOps.add (new AddChannelOp (srcIndex, bufIndex));
						}	
					}
				}
			}
			
			jassert (bufIndex >= 0);
			audioChannelsToUse.add (bufIndex);

			if (inputChan < numOuts)
				markBufferAsContaining (bufIndex, node->nodeId, inputChan);
			sourceNodes.clear(); //CS
			sourceOutputChans.clear(); //CS
		}

		for (int outputChan = numIns; outputChan < numOuts; ++outputChan)
		{
			const int bufIndex = getFreeBuffer (false);
			jassert (bufIndex != 0);
			audioChannelsToUse.add (bufIndex);

			markBufferAsContaining (bufIndex, node->nodeId, outputChan);
		}

		// Now the same thing for midi..
		Array <int> midiSourceNodes;

		for (int i = graph.getNumConnections(); --i >= 0;)
		{
			const AudioProcessorGraph::Connection* const c = graph.getConnection (i);

			if (c->destNodeId == node->nodeId && c->destChannelIndex == AudioProcessorGraph::midiChannelIndex)
				midiSourceNodes.add (c->sourceNodeId);
		}

		if (midiSourceNodes.size() == 0)
		{
			// No midi inputs..
			midiBufferToUse = getFreeBuffer (true); // need to pick a buffer even if the processor doesn't use midi

			if (node->getProcessor()->acceptsMidi() || node->getProcessor()->producesMidi())
				renderingOps.add (new ClearMidiBufferOp (midiBufferToUse));
		}
		else if (midiSourceNodes.size() == 1)
		{
			// One midi input..
			midiBufferToUse = getBufferContaining (midiSourceNodes.getUnchecked(0),
												   AudioProcessorGraph::midiChannelIndex);

			if (midiBufferToUse >= 0)
			{
				if (isBufferNeededLater (ourRenderingIndex,
										 AudioProcessorGraph::midiChannelIndex,
										 midiSourceNodes.getUnchecked(0),
										 AudioProcessorGraph::midiChannelIndex))
				{
					// can't mess up this channel because it's needed later by another node, so we
					// need to use a copy of it..
					const int newFreeBuffer = getFreeBuffer (true);
					renderingOps.add (new CopyMidiBufferOp (midiBufferToUse, newFreeBuffer));
					midiBufferToUse = newFreeBuffer;
				}
			}
			else
			{
				 // probably a feedback loop, so just use an empty one..
				 midiBufferToUse = getFreeBuffer (true); // need to pick a buffer even if the processor doesn't use midi
			}
		}
		else
		{
			// More than one midi input being mixed..
			int reusableInputIndex = -1;

			for (int i = 0; i < midiSourceNodes.size(); ++i)
			{
				const int sourceBufIndex = getBufferContaining (midiSourceNodes.getUnchecked(i),
																AudioProcessorGraph::midiChannelIndex);

				if (sourceBufIndex >= 0
					 && ! isBufferNeededLater (ourRenderingIndex,
											   AudioProcessorGraph::midiChannelIndex,
											   midiSourceNodes.getUnchecked(i),
											   AudioProcessorGraph::midiChannelIndex))
				{
					// we've found one of our input buffers that can be re-used..
					reusableInputIndex = i;
					midiBufferToUse = sourceBufIndex;
					break;
				}
			}

			if (reusableInputIndex < 0)
			{
				// can't re-use any of our input buffers, so get a new one and copy everything into it..
				midiBufferToUse = getFreeBuffer (true);
				jassert (midiBufferToUse >= 0);

				const int srcIndex = getBufferContaining (midiSourceNodes.getUnchecked(0),
														  AudioProcessorGraph::midiChannelIndex);
				if (srcIndex >= 0)
					renderingOps.add (new CopyMidiBufferOp (srcIndex, midiBufferToUse));
				else
					renderingOps.add (new ClearMidiBufferOp (midiBufferToUse));

				reusableInputIndex = 0;
			}

			for (int j = 0; j < midiSourceNodes.size(); ++j)
			{
				if (j != reusableInputIndex)
				{
					const int srcIndex = getBufferContaining (midiSourceNodes.getUnchecked(j),
															  AudioProcessorGraph::midiChannelIndex);
					if (srcIndex >= 0)
						renderingOps.add (new AddMidiBufferOp (srcIndex, midiBufferToUse));
				}
			}
		}

		if (node->getProcessor()->producesMidi())
			markBufferAsContaining (midiBufferToUse, node->nodeId,
									AudioProcessorGraph::midiChannelIndex);
		NodeDelays.set(String(node->id),max_delay + node->getProcessor()->getLatencySamples()); //CS
		
		if (numOuts == 0) output_delay = max_delay; //CS: assume this is the audio output
		renderingOps.add (new ProcessBufferOp (node, audioChannelsToUse,
											   totalChans, midiBufferToUse));
	}

I’ve done this in a plugin context so there’s a variable output_delay which contains the latency at the audio output. This variable doesn’t play a role for a host application; however, I should mention that in it’s calculation I’ve assumed that there is only one output node. Otherwise there’d be an extra loop over all output nodes, with added DelayChannelOps for those that have less than maximum latency. Also, I have omitted the MIDI part, as I don’t use that. Should be pretty much identical though…
And feedback loops don’t work, but I don’t think a plugin with a non-zero latency in a feedback loop can be handled properly. They don’t work even without PDC, but that’s a different story…

I haven’t looked too far into it yet but I just want to mention that the AudioProcessorGraph in the lastest Juce tip doesn’t seem to work correctly. Some connection scenarios work while others don’t. I have A-B tested against Vinn’s patch which seems to be fine.

I would investigate findEntry() since that’s the only significant difference

Thanks Vinn, I replaced the code in findEntry with the following basic for-loop and it fixed the problem. Unfortunately I don’t have anymore time today to figure out exactly why the current implementation isn’t working, but with 32 tracks, 4 auxs, and 10 nodes per, I get 650ms vs 620ms with Jules’ version. All pretty speedy. :slight_smile: I may try tomorrow to spot where the actual error is.

[code] Entry* result = nullptr;

	int firstElement = 0;
	int entrySize = entries.size();
	for ( int i = 0; i < entrySize; i++) {
		Entry* const firstEntry = entries.getUnchecked (i);
        if (destNodeId == firstEntry->destNodeId) {
			insertIndex	= i;
			result = firstEntry;
			break;
			
		}
	}
	
	return result;[/code]

For what it’s worth here is a screenshot of a configuration (Plugin Host Example) that results in corrupted sound. The graph works until you connect the node on the left, at which point the system’s audio corrupts and a restart is required (Snow Leopard).

[attachment=0]badgraph.jpg[/attachment]

That’s not really a long-term solution because although it works well in your case, if there are a lot of “destination” node Ids, it will get slow again.

But now that Jules knows there is a confirmed bug in findEntry() in the tip, it will be easy to locate and fix it.

haha sorry, I can see how it may have sounded that way but I wasn’t meaning to suggest it as an official solution. I figured I’d start by making sure that this was indeed the problem code and the simple for loop was the quickest ticket there. The idea that it performed decently was just a nice bonus.

I’m not sure who has more destination ids than me right now, considering I seemed to be the first to complain about it being impossible. But if you have concerns for a higher number I can tell you that I bumped it to 64 tracks, 16 aux tracks, 1 master track, all with 10 nodes. Works out to 810 nodes (which virtually all would end up being destination nodes) and 3666 connections. Optimized does it in ~7 seconds, basic for loop does ~11.5 seconds.

Sorry regarding the destination Ids, I did read that you said long-term, but somewhere lost track of it. It’s too late right now. :wink:

How about this (I’ve just copied the binary search algorithm from SortedSet, but haven’t tested it):

[code] Entry* findEntry (const uint32 destNodeId, int& insertIndex) const noexcept
{
Entry* result = nullptr;

    int start = 0;
    int end = entries.size();

    for (;;)
    {
        if (start >= end)
        {
            break;
        }
        else if (destNodeId == entries.getUnchecked (start)->destNodeId)
        {
            result = entries.getUnchecked (start);
            break;
        }
        else
        {
            const int halfway = (start + end) / 2;

            if (halfway == start)
            {
                if (destNodeId >= entries.getUnchecked (halfway)->destNodeId)
                    ++start;

                break;
            }
            else if (destNodeId >= entries.getUnchecked (halfway)->destNodeId)
                start = halfway;
            else
                end = halfway;
        }
    }

    insertIndex = start;
    return result;
}

[/code]

Yep, this one works! :smiley:

I stared at your original findEntry () and I couldn’t see the problem…any idea what it was? It looked right…

Nice work by the way, that was exactly what was needed.