AudioProcessorGraph slow manipulations

TheVinn · April 12, 2011, 5:49pm

Well the profile doesn’t lie…getConnectionBetween is the hot spot and it’s doing a linear search. I think if you put a sort on that Connection array and do a binary search it will get his run time down to 100ms

TheVinn · April 12, 2011, 6:02pm

Okay this will improve it by 800%:

Declare this struct right before getConnectionBetween()

struct ConnectionCompare
{
  static int compareElements (AudioProcessorGraph::Connection const* first, AudioProcessorGraph::Connection const* second)
  {
    if (first->sourceNodeId < second->sourceNodeId)
      return -1;
    else if (first->sourceNodeId > second->sourceNodeId)
      return 1;
    else if (first->destNodeId < second->destNodeId)
      return -1;
    else if (first->destNodeId > second->destNodeId)
      return 1;
    else if (first->sourceChannelIndex < second->sourceChannelIndex)
      return -1;
    else if (first->sourceChannelIndex > second->sourceChannelIndex)
      return 1;
    else if (first->destChannelIndex < second->destChannelIndex)
      return -1;
    else if (first->destChannelIndex > second->destChannelIndex)
      return 1;
    else
      return 0;
  }
};

Replace getConnectionBetween() with this:

const AudioProcessorGraph::Connection* AudioProcessorGraph::getConnectionBetween (const uint32 sourceNodeId,
                                                                                  const int sourceChannelIndex,
                                                                                  const uint32 destNodeId,
                                                                                  const int destChannelIndex) const
{
  ConnectionCompare comp;
  Connection c;
  c.sourceNodeId = sourceNodeId;
  c.destNodeId = destNodeId;
  c.sourceChannelIndex = sourceChannelIndex;
  c.destChannelIndex = destChannelIndex;
  int index = connections.indexOfSorted (comp, &c);
  if (index != -1)
    return connections.getUnchecked (index);
  return 0;
}

Change connections.add © in addConnection() to these lines:

    ConnectionCompare comp;
    connections.addSorted (comp, c);

graeme-2 · April 12, 2011, 6:46pm

Wow, this definitely does speed it up further! Awesome! I gotta run out for a bit, but I’ll run some more tests with it later. Thanks!

jules · April 12, 2011, 7:37pm

Nice one, thanks Vinn, I’ll see if I can incorporate that too…

TheVinn · April 12, 2011, 7:44pm

indexOfSorted() is about 60% and isBufferNeededLater() is about 40% of the run-time now so further improvements are likely only possible through algorithmic changes in isBufferNeededLater as you pointed out.

siedschl · April 13, 2011, 11:02am

I have implemented plugin delay compensation in an adapted version of CalculateRenderingSequence… let me know if you want to incorporate that too (can post code as well but it’s a bit lenghty).

jules · April 13, 2011, 11:07am

Sounds interesting… I’ve checked in an updated version with some of these optimisations in it, so if you have a version based on that, I’d certainly like to see what you’ve done!

TheVinn · April 13, 2011, 1:34pm

I think a generalized version of findEntry() in juce::Array or juce_ElementComparator.h would be handy

siedschl · April 13, 2011, 7:08pm

Ok, I’ll try to merge the two versions…

siedschl · April 15, 2011, 5:03pm

Okay, there were basically no changes in CalcRenderingSequence so not much merging to do…

First of all, we need a DelayChannelOp:

[code]class DelayChannelOp : public AudioGraphRenderingOp
{
public:
DelayChannelOp(const int ChannelNum_, const int Delay_) : ChannelNum(ChannelNum_), Delay(Delay_)
{
int size = Delay+5000; //safety margin for audio buffer size
DelayBuffer = new float[size];
memset(DelayBuffer,0,size*sizeof(float));
current=begin=&DelayBuffer[0];
end=&DelayBuffer[size-1];
delay=&DelayBuffer[Delay];
}
DelayChannelOp() : ChannelNum(0), Delay(0) {DelayBuffer=0;};

~DelayChannelOp() 
{
	if (DelayBuffer) delete DelayBuffer;
}

void perform (AudioSampleBuffer& sharedBufferChans, const OwnedArray <MidiBuffer>&, const int numSamples)
{
	float *in=sharedBufferChans.getSampleData(ChannelNum,0);
	int numSamples_ = numSamples;
	while (--numSamples_ >= 0)
	{
		*delay++ = *in;
		*in++ = *current++;
		if (delay>end) delay=begin;
		if (current>end) current=begin;
	}
}

private:
const int ChannelNum,Delay;
float *DelayBuffer,*begin,*end,*current,*delay;
};[/code]

Then, in CalcRenderingSequence, for each input to each node, we check which one causes the greatest amount of latency. All other inputs are delayed appropriately. Latency is passed down the rendering chain… I’ve marked all changes I did with a //CS comment.

class RenderingOpSequenceCalculator
{
public:

	RenderingOpSequenceCalculator (AudioProcessorGraph& graph_,
								   const Array<void*>& orderedNodes_,
								   Array<void*>& renderingOps)
		: graph (graph_),
		  orderedNodes (orderedNodes_)
	{
		nodeIds.add ((uint32) zeroNodeID); // first buffer is read-only zeros
		channels.add (0);

		midiNodeIds.add ((uint32) zeroNodeID);
		output_delay=0; //CS
		for (int i = 0; i < orderedNodes.size(); ++i)
		{
			createRenderingOpsForNode ((AudioProcessorGraph::Node*) orderedNodes.getUnchecked(i),
									   renderingOps, i);

			markAnyUnusedBuffersAsFree (i);
		}
		graph.setLatencySamples(output_delay); //CS
	}

	int getNumBuffersNeeded() const	 { return nodeIds.size(); }
	int getNumMidiBuffersNeeded() const	 { return midiNodeIds.size(); }

private:

	AudioProcessorGraph& graph;
	const Array<void*>& orderedNodes;
	Array <int> channels;
	Array <uint32> nodeIds, midiNodeIds;
	NamedValueSet NodeDelays; //CS
	int node_delay; //CS
	int max_delay,output_delay; //CS

	enum { freeNodeID = 0xffffffff, zeroNodeID = 0xfffffffe };

	static bool isNodeBusy (uint32 nodeID) throw()  { return nodeID != freeNodeID && nodeID != zeroNodeID; }

	void createRenderingOpsForNode (AudioProcessorGraph::Node* const node,
									Array<void*>& renderingOps,
									const int ourRenderingIndex)
	{
		const int numIns = node->getProcessor()->getNumInputChannels();
		const int numOuts = node->getProcessor()->getNumOutputChannels();
		const int totalChans = jmax (numIns, numOuts);

		Array <int> audioChannelsToUse;
		int midiBufferToUse = -1;

		max_delay=0; //CS
		float sendAmount = node->properties.getWithDefault("sendAmount",1);
		// get a list of all the inputs to this node
		Array <int> sourceNodes, sourceOutputChans;

		max_delay=0; //CS: first go over all input nodes regardless of channel to get max_delay of all inputs
		//output_delay=0;
		for (int i = graph.getNumConnections(); --i >= 0;)
		{
			const AudioProcessorGraph::Connection* const c = graph.getConnection (i);
			if (c->destNodeId == node->nodeId ) sourceNodes.add (c->sourceNodeId);
		}
		for (int i=0;i<sourceNodes.size();i++)	
		{
			node_delay = NodeDelays[String(sourceNodes.getUnchecked(i))];
			if (node_delay>max_delay) 
					max_delay = node_delay;
		}

		sourceNodes.clear();
		//CS: end of initial max_delay calculation
		for (int inputChan = 0; inputChan < numIns; ++inputChan)
		{
			/*CS: now defined outside of channel loop
			// get a list of all the inputs to this node
			Array <int> sourceNodes, sourceOutputChans;
			*/

			for (int i = graph.getNumConnections(); --i >= 0;)
			{
				const AudioProcessorGraph::Connection* const c = graph.getConnection (i);

				if (c->destNodeId == node->nodeId && c->destChannelIndex == inputChan)
				{
					sourceNodes.add (c->sourceNodeId);
					sourceOutputChans.add (c->sourceChannelIndex);
				}
			}

			int bufIndex = -1;
			

			if (sourceNodes.size() == 0)
			{
				// unconnected input channel

				if (inputChan >= numOuts)
				{
					bufIndex = getReadOnlyEmptyBuffer();
					jassert (bufIndex >= 0);
				}
				else
				{
					bufIndex = getFreeBuffer (false);
					renderingOps.add (new ClearChannelOp (bufIndex));
				}
			}
			else if (sourceNodes.size() == 1)
			{
				// channel with a straightforward single input..
				const int srcNode = sourceNodes.getUnchecked(0);
				const int srcChan = sourceOutputChans.getUnchecked(0);

				bufIndex = getBufferContaining (srcNode, srcChan);

				if (bufIndex < 0)
				{
					// if not found, this is probably a feedback loop
					bufIndex = getReadOnlyEmptyBuffer();
					jassert (bufIndex >= 0);
				}

				if (inputChan < numOuts
					 && isBufferNeededLater (ourRenderingIndex,
											 inputChan,
											 srcNode, srcChan))
				{
					// can't mess up this channel because it's needed later by another node, so we
					// need to use a copy of it..
					const int newFreeBuffer = getFreeBuffer (false);

					renderingOps.add (new CopyChannelOp (bufIndex, newFreeBuffer));

					bufIndex = newFreeBuffer;
				}
				node_delay = NodeDelays[String(srcNode)]; //CS
				
				if (node_delay<max_delay) //CS: need to add extra delay to keep everything in sync
					renderingOps.add ( new DelayChannelOp(bufIndex,max_delay-node_delay));
				
			}
			else
			{
				// channel with a mix of several inputs..

				// try to find a re-usable channel from our inputs..
				int reusableInputIndex = -1;

				for (int i = 0; i < sourceNodes.size(); ++i)
				{
					const int sourceBufIndex = getBufferContaining (sourceNodes.getUnchecked(i),
																	sourceOutputChans.getUnchecked(i));

					if (sourceBufIndex >= 0
						&& ! isBufferNeededLater (ourRenderingIndex,
												  inputChan,
												  sourceNodes.getUnchecked(i),
												  sourceOutputChans.getUnchecked(i)))
					{
						// we've found one of our input chans that can be re-used..
						reusableInputIndex = i;
						bufIndex = sourceBufIndex;
						node_delay = NodeDelays[String(sourceNodes.getUnchecked(i))]; //CS
						if (node_delay<max_delay) //CS
						{
							renderingOps.add(new DelayChannelOp
												(sourceBufIndex,max_delay-node_delay));
						}
						break;
					}
				}

				if (reusableInputIndex < 0)
				{
					// can't re-use any of our input chans, so get a new one and copy everything into it..
					bufIndex = getFreeBuffer (false);
					jassert (bufIndex != 0);

					const int srcIndex = getBufferContaining (sourceNodes.getUnchecked (0),
															  sourceOutputChans.getUnchecked (0));
					if (srcIndex < 0)
					{
						// if not found, this is probably a feedback loop
						renderingOps.add (new ClearChannelOp (bufIndex));
					}
					else
					{
						renderingOps.add (new CopyChannelOp (srcIndex, bufIndex));
					}

					reusableInputIndex = 0;
					node_delay = NodeDelays[String(sourceNodes.getUnchecked(0))]; //CS
					if (node_delay<max_delay) //CS
					{
						renderingOps.add(new DelayChannelOp
										(bufIndex,max_delay-node_delay)); 
					}
				}

				for (int j = 0; j < sourceNodes.size(); ++j)
				{
					if (j != reusableInputIndex)
					{
						const int srcIndex = getBufferContaining (sourceNodes.getUnchecked(j),
																  sourceOutputChans.getUnchecked(j));
						if (srcIndex >= 0) //CS: changed whole block
						{
							node_delay = NodeDelays[String(sourceNodes.getUnchecked(j))];
							if (node_delay<max_delay) //CS
							{
								if (! isBufferNeededLater (ourRenderingIndex,
								inputChan,
								sourceNodes.getUnchecked(j),
								sourceOutputChans.getUnchecked(j))) //CS
								{	
									renderingOps.add(new DelayChannelOp(srcIndex,         
										max_delay-node_delay)); //CS
								}
								else //CS: buffer is reused elsewhere, can't be delayed
								{	
									int buffer_to_delay = getFreeBuffer(false);
									renderingOps.add (new CopyChannelOp (srcIndex, buffer_to_delay));
									renderingOps.add(new DelayChannelOp(buffer_to_delay,         
										max_delay-node_delay)); //CS
									renderingOps.add (new AddChannelOp (buffer_to_delay, bufIndex));
								}
							}
							else renderingOps.add (new AddChannelOp (srcIndex, bufIndex));
						}	
					}
				}
			}
			
			jassert (bufIndex >= 0);
			audioChannelsToUse.add (bufIndex);

			if (inputChan < numOuts)
				markBufferAsContaining (bufIndex, node->nodeId, inputChan);
			sourceNodes.clear(); //CS
			sourceOutputChans.clear(); //CS
		}

		for (int outputChan = numIns; outputChan < numOuts; ++outputChan)
		{
			const int bufIndex = getFreeBuffer (false);
			jassert (bufIndex != 0);
			audioChannelsToUse.add (bufIndex);

			markBufferAsContaining (bufIndex, node->nodeId, outputChan);
		}

		// Now the same thing for midi..
		Array <int> midiSourceNodes;

		for (int i = graph.getNumConnections(); --i >= 0;)
		{
			const AudioProcessorGraph::Connection* const c = graph.getConnection (i);

			if (c->destNodeId == node->nodeId && c->destChannelIndex == AudioProcessorGraph::midiChannelIndex)
				midiSourceNodes.add (c->sourceNodeId);
		}

		if (midiSourceNodes.size() == 0)
		{
			// No midi inputs..
			midiBufferToUse = getFreeBuffer (true); // need to pick a buffer even if the processor doesn't use midi

			if (node->getProcessor()->acceptsMidi() || node->getProcessor()->producesMidi())
				renderingOps.add (new ClearMidiBufferOp (midiBufferToUse));
		}
		else if (midiSourceNodes.size() == 1)
		{
			// One midi input..
			midiBufferToUse = getBufferContaining (midiSourceNodes.getUnchecked(0),
												   AudioProcessorGraph::midiChannelIndex);

			if (midiBufferToUse >= 0)
			{
				if (isBufferNeededLater (ourRenderingIndex,
										 AudioProcessorGraph::midiChannelIndex,
										 midiSourceNodes.getUnchecked(0),
										 AudioProcessorGraph::midiChannelIndex))
				{
					// can't mess up this channel because it's needed later by another node, so we
					// need to use a copy of it..
					const int newFreeBuffer = getFreeBuffer (true);
					renderingOps.add (new CopyMidiBufferOp (midiBufferToUse, newFreeBuffer));
					midiBufferToUse = newFreeBuffer;
				}
			}
			else
			{
				 // probably a feedback loop, so just use an empty one..
				 midiBufferToUse = getFreeBuffer (true); // need to pick a buffer even if the processor doesn't use midi
			}
		}
		else
		{
			// More than one midi input being mixed..
			int reusableInputIndex = -1;

			for (int i = 0; i < midiSourceNodes.size(); ++i)
			{
				const int sourceBufIndex = getBufferContaining (midiSourceNodes.getUnchecked(i),
																AudioProcessorGraph::midiChannelIndex);

				if (sourceBufIndex >= 0
					 && ! isBufferNeededLater (ourRenderingIndex,
											   AudioProcessorGraph::midiChannelIndex,
											   midiSourceNodes.getUnchecked(i),
											   AudioProcessorGraph::midiChannelIndex))
				{
					// we've found one of our input buffers that can be re-used..
					reusableInputIndex = i;
					midiBufferToUse = sourceBufIndex;
					break;
				}
			}

			if (reusableInputIndex < 0)
			{
				// can't re-use any of our input buffers, so get a new one and copy everything into it..
				midiBufferToUse = getFreeBuffer (true);
				jassert (midiBufferToUse >= 0);

				const int srcIndex = getBufferContaining (midiSourceNodes.getUnchecked(0),
														  AudioProcessorGraph::midiChannelIndex);
				if (srcIndex >= 0)
					renderingOps.add (new CopyMidiBufferOp (srcIndex, midiBufferToUse));
				else
					renderingOps.add (new ClearMidiBufferOp (midiBufferToUse));

				reusableInputIndex = 0;
			}

			for (int j = 0; j < midiSourceNodes.size(); ++j)
			{
				if (j != reusableInputIndex)
				{
					const int srcIndex = getBufferContaining (midiSourceNodes.getUnchecked(j),
															  AudioProcessorGraph::midiChannelIndex);
					if (srcIndex >= 0)
						renderingOps.add (new AddMidiBufferOp (srcIndex, midiBufferToUse));
				}
			}
		}

		if (node->getProcessor()->producesMidi())
			markBufferAsContaining (midiBufferToUse, node->nodeId,
									AudioProcessorGraph::midiChannelIndex);
		NodeDelays.set(String(node->id),max_delay + node->getProcessor()->getLatencySamples()); //CS
		
		if (numOuts == 0) output_delay = max_delay; //CS: assume this is the audio output
		renderingOps.add (new ProcessBufferOp (node, audioChannelsToUse,
											   totalChans, midiBufferToUse));
	}

I’ve done this in a plugin context so there’s a variable output_delay which contains the latency at the audio output. This variable doesn’t play a role for a host application; however, I should mention that in it’s calculation I’ve assumed that there is only one output node. Otherwise there’d be an extra loop over all output nodes, with added DelayChannelOps for those that have less than maximum latency. Also, I have omitted the MIDI part, as I don’t use that. Should be pretty much identical though…
And feedback loops don’t work, but I don’t think a plugin with a non-zero latency in a feedback loop can be handled properly. They don’t work even without PDC, but that’s a different story…

graeme-2 · April 15, 2011, 7:24pm

I haven’t looked too far into it yet but I just want to mention that the AudioProcessorGraph in the lastest Juce tip doesn’t seem to work correctly. Some connection scenarios work while others don’t. I have A-B tested against Vinn’s patch which seems to be fine.

TheVinn · April 15, 2011, 7:42pm

I would investigate findEntry() since that’s the only significant difference

graeme-2 · April 15, 2011, 11:45pm

Thanks Vinn, I replaced the code in findEntry with the following basic for-loop and it fixed the problem. Unfortunately I don’t have anymore time today to figure out exactly why the current implementation isn’t working, but with 32 tracks, 4 auxs, and 10 nodes per, I get 650ms vs 620ms with Jules’ version. All pretty speedy. I may try tomorrow to spot where the actual error is.

[code] Entry* result = nullptr;

	int firstElement = 0;
	int entrySize = entries.size();
	for ( int i = 0; i < entrySize; i++) {
		Entry* const firstEntry = entries.getUnchecked (i);
        if (destNodeId == firstEntry->destNodeId) {
			insertIndex	= i;
			result = firstEntry;
			break;
			
		}
	}
	
	return result;[/code]

For what it’s worth here is a screenshot of a configuration (Plugin Host Example) that results in corrupted sound. The graph works until you connect the node on the left, at which point the system’s audio corrupts and a restart is required (Snow Leopard).

[attachment=0]badgraph.jpg[/attachment]

TheVinn · April 15, 2011, 11:57pm

That’s not really a long-term solution because although it works well in your case, if there are a lot of “destination” node Ids, it will get slow again.

But now that Jules knows there is a confirmed bug in findEntry() in the tip, it will be easy to locate and fix it.

graeme-2 · April 16, 2011, 6:28am

haha sorry, I can see how it may have sounded that way but I wasn’t meaning to suggest it as an official solution. I figured I’d start by making sure that this was indeed the problem code and the simple for loop was the quickest ticket there. The idea that it performed decently was just a nice bonus.

I’m not sure who has more destination ids than me right now, considering I seemed to be the first to complain about it being impossible. But if you have concerns for a higher number I can tell you that I bumped it to 64 tracks, 16 aux tracks, 1 master track, all with 10 nodes. Works out to 810 nodes (which virtually all would end up being destination nodes) and 3666 connections. Optimized does it in ~7 seconds, basic for loop does ~11.5 seconds.

graeme-2 · April 16, 2011, 6:36am

Sorry regarding the destination Ids, I did read that you said long-term, but somewhere lost track of it. It’s too late right now.

jules · April 16, 2011, 11:15am

How about this (I’ve just copied the binary search algorithm from SortedSet, but haven’t tested it):

[code] Entry* findEntry (const uint32 destNodeId, int& insertIndex) const noexcept
{
Entry* result = nullptr;

    int start = 0;
    int end = entries.size();

    for (;;)
    {
        if (start >= end)
        {
            break;
        }
        else if (destNodeId == entries.getUnchecked (start)->destNodeId)
        {
            result = entries.getUnchecked (start);
            break;
        }
        else
        {
            const int halfway = (start + end) / 2;

            if (halfway == start)
            {
                if (destNodeId >= entries.getUnchecked (halfway)->destNodeId)
                    ++start;

                break;
            }
            else if (destNodeId >= entries.getUnchecked (halfway)->destNodeId)
                start = halfway;
            else
                end = halfway;
        }
    }

    insertIndex = start;
    return result;
}

[/code]

graeme-2 · April 16, 2011, 2:05pm

Yep, this one works!

TheVinn · April 17, 2011, 2:39am

I stared at your original findEntry () and I couldn’t see the problem…any idea what it was? It looked right…

TheVinn · April 17, 2011, 2:41am

Nice work by the way, that was exactly what was needed.

Topic		Replies	Views
Chaining AudioProcessorGraphs General JUCE discussion	41	4158	May 6, 2014
Optimisations for RenderingOpsSequenceCalculator General JUCE discussion	25	1843	August 19, 2015
AudioProcessorGraph && ChangeBroadcaster General JUCE discussion	38	4308	December 1, 2017
[FR] AudioProcessorGraph addConnections/removeConnections Feature Requests	6	1458	June 26, 2018
Multi core support General JUCE discussion	18	1611	December 20, 2012

AudioProcessorGraph slow manipulations

Purchase

Discover

Learn

Support

About

Events

AudioProcessorGraph slow manipulations

Related topics

Purchase

Discover

Learn

Support

About

Events