AudioProcessorGraph slow manipulations

Hi Jules,

I have found that the AudioProcessorGraph quickly slows down to an unusable point when doing graph manipulations with more than just a few nodes and connections. It happens in buildRenderingSequence and I think isAnInputTo is a big part of it. It appears that the manner in which the nodes are connected can have a major impact on it. I have verified this behaviour with the Plugin Host example and have attached a couple examples.

This first image shows a graph that takes just over 2 seconds to create or remove a single connection. It has 23 nodes and 42 connections.

[attachment=1]23nodes42connections2332ms.jpg[/attachment]

In this second one, I have added just 2 more nodes (on the left) and the time required to add a single connection between them has jumped to over 8 seconds.

[attachment=0]25nodes43connections8050.jpg[/attachment]

In my own application when using 100 nodes and 190 connections, arming a track is currently taking 25 seconds, so I am very interested to hear what you think about the situation.

Thanks,
Graeme

Hmm, Iā€™ve never really tried using that class with more than a few effects, so not hugely surprised that it might not scale linearly! The algorithm it uses to sort the nodes into the correct rendering order certainly looks like itā€™d scale badly, and would need some extreme cleverness to convert it from its current naive implementation to a more efficient algorithm. Interesting problem!

Damn was gonna use this for a similar type application.

It would be interesting to see how PureData manages its graph. I think I have a paper explaining it somewhere.

Thereā€™s only one weak spot in it, where it has to create a sorted list of all the nodes in the graph, and Iā€™ve implemented it in a really simple way - Iā€™m sure there are plenty of better ways to do that operation. Might involve some reading-up about graph traversal algorithms.

Can you post a simple test app that creates a graph with a lot of connections? I would love to hit it with Intel vTune Amplifier :slight_smile:

Just a single .cpp inside a ā€˜[ā€˜codeā€™]ā€™ tag.

Thanks for taking a look at this Jules. Iā€™m dead in the water until a fix or workaround is found so please keep me posted. Iā€™ll try to look at it more myself but my current lack of expertise gets in the way of being more helpful.

@TheVinn, sorry I donā€™t think that would be so simple for me to do. You could fire up the Plugin Host example and load up it up like in the examples? Only takes a few minutes and you can save the configuration so it can be loaded up again. :slight_smile:

I think Jules was giving advice on how someone else might tune the class rather than doing it himself Graeme.

Vincent, just load the plugin host and you can load any number of VSTs or other plugins and connect them up. You can save it then so you donā€™t have to do it every time.

The problem is known though. If you look at the graph class, you can see that the logic for sorting the list will be slow if the number of nodes is high. If there was a way to incrementally modify the graph state whenever you needed to add or remove a node, that would fix it but I dunno if thats possible.

I guess a topological sort or some variant is in the right direction? http://en.wikipedia.org/wiki/Topological_sorting

Well canā€™t you just write a simple test program? I need something that doesnā€™t require user interaction that is repeatable.

Make a small Juce app that creates a bunch of nodes and connects them randomly (using juce::Random with a known seed so the results are repeatable). Hereā€™s an example of a test app:

#include "juce.h"

namespace detail {

// Ideas based on boost

class NonCopyable
{
protected:
  NonCopyable() {}
  ~NonCopyable() {}
private:
  NonCopyable (const NonCopyable&);
  const NonCopyable& operator= (const NonCopyable&);
};

}

typedef detail::NonCopyable NonCopyable;
template <class OwnerClass>
class LeakChecked : private LeakedObjectDetector <LeakChecked <OwnerClass> >
{
private:
  friend class LeakedObjectDetector <LeakChecked>;
  static const char* getLeakedObjectClassName() throw()
    { return typeid (OwnerClass).name (); }
};

class ScopedAudioSampleBuffer : LeakChecked <ScopedAudioSampleBuffer>, NonCopyable
{
  JUCE_DECLARE_NON_COPYABLE_WITH_LEAK_DETECTOR (ScopedAudioSampleBuffer);

public:
  ScopedAudioSampleBuffer (int numChannels, int numSamples)
      : m_buffer (numChannels, numSamples)
  {
  }
  
  ~ScopedAudioSampleBuffer ()
  {
  }

  AudioSampleBuffer* getBuffer ()
  {
    return &m_buffer;
  }

  AudioSampleBuffer m_buffer;
};

struct Job : ThreadPoolJob
{
  Random m_rand;

  Job() : ThreadPoolJob ("Job")
        , m_rand (reinterpret_cast <unsigned long> (this))
  {
  }

  JobStatus runJob ()
  {
    for (int i = 0; i < 25; ++i)
    {
      for (int j = 0; j < 20; ++j)
      {
        ScopedAudioSampleBuffer buffer1 (2, 16 + m_rand.nextInt (256));

        {
          ScopedAudioSampleBuffer buffer2 (2, 16 + m_rand.nextInt (256));
          buffer2.getBuffer()->clear();
        }

        buffer1.getBuffer()->clear();
      }

      if (m_rand.nextBool())
        Thread::sleep (1);
    }

    return jobNeedsRunningAgain;
  }
};

struct ContentComponent : Component
{
  ThreadPool m_pool;
  OwnedArray <Job> m_jobs;

  static const int nThread = 8;

  ContentComponent() : m_pool (nThread, false, -1)
  {
    setSize (512, 384);

    for (int i = 0; i < nThread; ++i)
    {
      Job* job = new Job;
      m_jobs.add (job);
      m_pool.addJob (job);
    }
  }

  ~ContentComponent ()
  {
    m_pool.removeAllJobs (true, 100000, false);
  }

  void paint (Graphics& g)
  {
    const Rectangle <int> bounds = getLocalBounds ();
    g.setColour (Colours::white);
    g.fillRect (bounds);
  }
};

struct MainWindow : DocumentWindow
{
  MainWindow ()
    : DocumentWindow (JUCE_T("Test")
      , Colours::black
      , DocumentWindow::allButtons
      , true )
  {
    ContentComponent* p = new ContentComponent;
    setResizable (true, false);
    setContentOwned (p, true);
    centreWithSize (getWidth(), getHeight());
    setVisible( true );
  }

  ~MainWindow()
  {
  }

  void closeButtonPressed()
  {
    JUCEApplication::quit();
  }
};

struct MainApp : JUCEApplication
{
  MainApp() : mainWindow(0) { s_app=this; }
  ~MainApp() { s_app=0; }
  static MainApp& GetInstance() { return *s_app; }
  const String getApplicationName() { return JUCE_T("JuceTest"); }
  const String getApplicationVersion() { return JUCE_T("0.1.0"); }
  bool moreThanOneInstanceAllowed() { return true; }
  void anotherInstanceStarted (const String& commandLine) {}

  void initialise (const String& commandLine)
  {
    mainWindow = new MainWindow;
  }

  void shutdown()
  {
    delete mainWindow;
  }

  static MainApp* s_app;
  MainWindow* mainWindow;
};

MainApp* MainApp::s_app = 0;

START_JUCE_APPLICATION (MainApp)

If you will provide a test app, I will run it through vTune and post the results.

Vinn, I do appreciate the offer to help look at this but Iā€™m not even sure how to load a plugin without specifying which one to load. So which plugins do you have? What platform are you on? Is it possible to load an AudioProcessor without an actual plugin instance?

[quote=ā€œAnimaā€]
I think Jules was giving advice on how someone else might tune the class rather than doing it himself Graeme. [/quote]
Thanks Anima. I trust that Jules is capable of articulating his own position on things and as such feel that my response was and still is appropriate. I did mention that I would try to look further into it, but Iā€™d rather let the people who really know what theyā€™re doing take care of what they are best at. What good is it for me to start blasting a bunch of ignorant guesses around? You can be rest assured that I am following along, learning as much as I can as I go, and that I feel inclined to be as helpful as I can, whenever possible.

Thanks!

Okay, you said that the AudioProcessGraph operations are slow, so Iā€™m assuming you are referring to the scaffolding code (i.e. the maintenance of connection) and not the actual sample processing.

If this is the case, then you can make a simple test program that creates an AudioProcessorGraph, and add some ā€œdummyā€ nodes to it. A dummy node is a subclass of AudioProcessor that basically does nothing, and is not tied to any plugin. You can just provide simple implementations for the necessary pure virtual functions (for example, a processBlock() that does nothing).

The resulting test program would compile and work on any platform. It doesnā€™t actually process any audio data. With this program we could establish a benchmark of performance, identify where the hotspots are (vTune amplifier) and then speak in concrete terms on what needs to be improved.

Ok heres some code that should do what you asked but if not, well most of the donkey work should be done still.

Havenā€™t tested it but I think it should be good to go.

class BlankProcessor  : public AudioProcessor
{
public:
    //==============================================================================
	BlankProcessor() {}
	~BlankProcessor() {}

    //==============================================================================
	void prepareToPlay (double sampleRate, int samplesPerBlock) {}
	void releaseResources() {}

	void processBlock (AudioSampleBuffer& buffer, MidiBuffer& midiMessages) 
	{
		buffer.clear();
	}

    //==============================================================================
	AudioProcessorEditor* createEditor() { return nullptr; }
	bool hasEditor() const { return false; }

    //==============================================================================
	const String getName() const { return String::empty; }

	int getNumParameters() { return 0; } 

	float getParameter (int index) { return 0.0f; }
	void setParameter (int index, float newValue) {}

	const String getParameterName (int index) { return String::empty; }
	const String getParameterText (int index) { return String::empty; }

	const String getInputChannelName (int channelIndex) const { return String::empty; }
	const String getOutputChannelName (int channelIndex) const { return String::empty; }
	bool isInputChannelStereoPair (int index) const { return true; }
	bool isOutputChannelStereoPair (int index) const { return true; }

	bool acceptsMidi() const { return true; }
	bool producesMidi() const { return true; }

    //==============================================================================
	int getNumPrograms() { return 1; }
	int getCurrentProgram() { return 0; }
	void setCurrentProgram (int index) {}
	const String getProgramName (int index) { return String::empty; }
	void changeProgramName (int index, const String& newName) {}

    //==============================================================================
	void getStateInformation (MemoryBlock& destData) {}
	void setStateInformation (const void* data, int sizeInBytes) {}
};

class AudioGraphTest
{
public:
	AudioGraphTest()
		:	graph(	new AudioProcessorGraph )
	{
                graph->setPlayConfigDetails( 2, 
						           2, 
						          44100, 1024 );
		graph->prepareToPlay( 44100, 1024 );
	}

	~AudioGraphTest() {}

	void populateGraph( int numNodes, int numConnections, int64 seed ) 
	{
		Random rand( seed );
		int src = 0, dst = 0;

		/** Generate nodes */
		++numNodes;
		for( int i = numNodes; --i >= 1; )
		{
			graph->addNode( new BlankProcessor, i );
		}

		/** Add some connections */
		for( int i = numConnections; --i >= 0; )
		{
			src = rand.nextInt( numNodes );
			dst = rand.nextInt( numNodes );

			/** Check that we don't add a connection to the same node */
			while( src == dst )
				dst == rand.nextInt( numNodes );

			graph->addConnection( src, 0, dst, 0 );
			graph->addConnection( src, 1, dst, 1 );
		}

		/** Make sure all connections are valid */
		graph->removeIllegalConnections();
	}

	void reset()
	{
		graph->clear();
	}

private:
	ScopedPointer<AudioProcessorGraph> graph;
};
1 Like

Yes this is about handling connections.

Thanks for posting the code Anima. :slight_smile:

Alright, I tweaked your example code so that it calls prepareToPlay, which builds the rendering sequence (I think this is the time consuming part). Here is the modified code, it is stand alone that runs the test a few times and shows the total time:

#include "juce.h"

class BlankProcessor  : public AudioProcessor
{
public:
  //==============================================================================
  BlankProcessor() {}
  ~BlankProcessor() {}

  //==============================================================================
  void prepareToPlay (double sampleRate, int samplesPerBlock) {}
  void releaseResources() {}

  void processBlock (AudioSampleBuffer& buffer, MidiBuffer& midiMessages)
  {
    buffer.clear();
  }

  //==============================================================================
  AudioProcessorEditor* createEditor() { return 0; }
  bool hasEditor() const { return false; }

  //==============================================================================
  const String getName() const { return String::empty; }

  int getNumParameters() { return 0; }

  float getParameter (int index) { return 0.0f; }
  void setParameter (int index, float newValue) {}

  const String getParameterName (int index) { return String::empty; }
  const String getParameterText (int index) { return String::empty; }

  const String getInputChannelName (int channelIndex) const { return String::empty; }
  const String getOutputChannelName (int channelIndex) const { return String::empty; }
  bool isInputChannelStereoPair (int index) const { return true; }
  bool isOutputChannelStereoPair (int index) const { return true; }

  bool acceptsMidi() const { return true; }
  bool producesMidi() const { return true; }

  //==============================================================================
  int getNumPrograms() { return 1; }
  int getCurrentProgram() { return 0; }
  void setCurrentProgram (int index) {}
  const String getProgramName (int index) { return String::empty; }
  void changeProgramName (int index, const String& newName) {}

  //==============================================================================
  void getStateInformation (MemoryBlock& destData) {}
  void setStateInformation (const void* data, int sizeInBytes) {}
};

class AudioGraphTest
{
public:
  AudioGraphTest()
    :   graph(   new AudioProcessorGraph )
  {
    graph->setPlayConfigDetails( 2,
      2,
      44100, 1024 );
    graph->prepareToPlay( 44100, 1024 );
  }

  ~AudioGraphTest() {}

  void populateGraph( int numNodes, int numConnections, int64 seed )
  {
    Random rand( seed );
    int src = 0, dst = 0;

    /** Generate nodes */
    ++numNodes;
    for( int i = numNodes; --i >= 1; )
    {
      graph->addNode( new BlankProcessor, i );
    }

    /** Add some connections */
    for( int i = numConnections; --i >= 0; )
    {
      src = rand.nextInt( numNodes );
      dst = rand.nextInt( numNodes );

      /** Check that we don't add a connection to the same node */
      while( src == dst )
        dst = rand.nextInt( numNodes );

      graph->addConnection( src, 0, dst, 0 );
      graph->addConnection( src, 1, dst, 1 );
    }

    /** Make sure all connections are valid */
    graph->removeIllegalConnections();
  }

  void reset()
  {
    graph->clear();
  }

  void prepareToPlay ()
  {
    graph->prepareToPlay (44100, 1024);
  }

private:
  ScopedPointer<AudioProcessorGraph> graph;
};

//------------------------------------------------------------------------------

template <class T>
struct Stats
{
  Stats (int count, const T* data)
  {
    n = count;

    lo = data[0];
    hi = data[0];
    sum = 0;

    for (int i = 0; i < count; ++i)
    {
      T v = data[i];
      lo = std::min (lo, v);
      hi = std::max (lo, v);
      sum += v;
    }

    mean = double (sum) / count;

    double totvarsquared = 0;
    for (int i = 0; i < count; ++i)
    {
      T v = data[i];
      const double var = v - mean;
      totvarsquared += var * var;
    }

    dev = sqrt (totvarsquared / (n - 1));
  }

  int n;
  double dev;
  double mean;
  T lo;
  T hi;
  T sum;
};

//------------------------------------------------------------------------------

static void print (const String& s)
{
  std::wcout << s << std::endl;
}

static void run (void (*test)(), const char* name)
{
  const int ntrial = 5;

  int64 trial[ntrial];
  int64 total;

  Time startTime = Time::getCurrentTime();
  for (int i = 0; i < ntrial; ++i)
  {
    Time startTime = Time::getCurrentTime();
    test ();
    RelativeTime elapsedTime = Time::getCurrentTime() - startTime;
    trial [i] = elapsedTime.inMilliseconds ();

    int64 ms = elapsedTime.inMilliseconds ();
    String s;
    s << name << " #" << String (i+1) << " " <<
      String (ms) << " ms";
    print (s);
  }
  RelativeTime elapsedTime = Time::getCurrentTime() - startTime;

  total = elapsedTime.inMilliseconds ();

  Stats<int64> stats (ntrial, trial);

  String s;
  int64 ms = elapsedTime.inMilliseconds ();
  s << name << "   " <<
    String (stats.sum) << " total, " <<
    String (stats.mean, 0) << " avg, " <<
    String (stats.dev, 3) << " dev";
  print (s);
}

//------------------------------------------------------------------------------

void test1 ()
{
  AudioGraphTest agt;
  agt.populateGraph (4096, 65536, 1);
  agt.prepareToPlay ();
}

int main (int, char**)
{
  ScopedJuceInitialiser_GUI jucelib;

  run (test1, "AudioGraphTest");

  return 0;
}

vTune result

[attachment=0]audiograph.png[/attachment]

Honestly for the amount of nodes and connections I have (which is HUGE) this does not seem slow.

An obvious optimization would be a constant-time implementation for getNodeForId(), either with a hash map or by changing nodeIds to be small integers that can be used as array indexes (although this might not be possible for client-provided nodeIds). Such a change would almost double the speed.

I think you should check the state of the graph before you run the test Vinn, would be good to know how many connections and nodes are definitely there. Might need to tweak the ā€œadding connections codeā€ if most of them are not getting added correctly.

Interesting with the getNodeForID method, definitely room for tweaking.

Exactly the amount passed to the populate function (4096 nodes and 65536 connections in my case).

removeIllegalConnections() returns false. As in, the sample code you provided never produces illegal connections.

Unless the test program is not indicative of your use-case, it is clear that the problem is not what you think it is (slow graph algorithm).

If you look at my first post you will see that I provided a configuration with 23 nodes and 43 connections and under this configuration it takes 8 seconds to to manipulate 1 connection.

sorry, thatā€™s 25 nodes and 43 connections

That makes absolutely no sense, even an O(N^4) algorithm should execute in far less time. Something is different with your app.

Is there any way you can modify the test class so that it sets up the connections identically to your scenario?

I meanā€¦ I have 4096 nodes and 65536 connections and it takes less than 4 secondsā€¦

are you running debug build?

the screen shot is from the Plugin Host Example application included with Juce. You are welcome to try it out yourself. It is a debug build.