AU plugin : CPU usage high in Logic Pro in record mode

Hi Jules,

I am currently working on trying to improve the performances of my plugins.

I discovered with AAX plugins that the block size value (that I get with the function getBlockSize() ) didn't always match the blocksize parameter of the last call to prepareToPlay. The value returned by getBlockSize() was greater than needed.
As a result, my FFT arrays were too large compared to the buffer size, and that was uselessly costing CPU.

I think I have found a fix. In the process function of the AAX wrapper, instead of calling prepareToPlay:

if (lastBufferSize != bufferSize)
     lastBufferSize = bufferSize;

     pluginInstance->prepareToPlay (sampleRate, bufferSize);

I call preparePlugin() that calls setPlayConfigDetail (which resets the block size) first and then prepareToPlay:

if (lastBufferSize != bufferSize) 
       lastBufferSize = bufferSize;

and it works much better! In my case, the CPU usage is divided by two.


However I have another problem with Logic now.

In recording mode, the process buffer size can drop by 2 or 4, (depending on the Logic settings) but there is no call first to prepareToPlay to inform of the buffer size change.

Therefore, my FFT size (set in prepareToPlay) remains high and the CPU rises high...


What do you thing about the above fix?

Could something similar be done in the AU wrapper to optimize the performances ???





Seems like there's quite a lot of overhead in the preparePlugin function if it's called during the process callback, but how about this?

                if (lastBufferSize != bufferSize)
                    lastBufferSize = bufferSize;
                    pluginInstance->setPlayConfigDetails (pluginInstance->getNumInputChannels(),
                                                          sampleRate, bufferSize);
                    pluginInstance->prepareToPlay (sampleRate, bufferSize);

Indeed. No need for more.


What about the AU CPU overload ?

Let's take an example in Logic Pro X.

My Audio settings are: 96k, I/0 buffer size is 128 samples and the Process buffer Range is set to Medium.

With these parameters, my buffer size in prepareToPlay() is 1024.  My audio process uses FFT algorithms, so this is where I allocate my FFT buffers.

The buffer size of the procesBlock() is 1024 is playing and 128 in recording .


When I play, no problem, the FFT matches the number of samples processed, but in recording, the FFT buffers are way too big, this makes my CPU go very high.


Any idea how to solve it?

Did someone meet the same problem?

Is this a problem with the AU wrapper or does it come from my code?


You may have a good reason for it, but surely the size of your FFT should be unrelated to the size of the buffers involved?

Yes that's the point.

But prepareToPlay() is not called back when I switch to recording.

So may be a call to prepareToPlay() with the recording buffer size as parameter is missing?

Not sure I understand.. My point is that you should allocate your FFT at whatever size is best, not at a size that depends on the block size, since that could change at any time. So I'm unclear about why it would make any difference.. Surely the incoming data goes into a fifo and you perform your FFT when there's enough data, so it shouldn't matter whether what size of blocks the data arrives in?

My incoming data is used for real-time convolution in the frequency domain.

If I use a fifo as you say, I'll have to wait until my buffer is full to process it.

The processBlock() buffer size is the one set in the Logic settings so I have to return the data immediatly.

So in my opinion, this solution can't suit.

Ok, but what if you get 93 samples? You can't change your algorithm to fit the block size, because it's unpredictable and may change at any time. The way people normally do that kind of thing is to use a fifo and publish your latency.

Yes, I agree with that.  When I get less samples than expected in processBlock(), I have to put them into a fifo.

But if I take the previous example:

My latency is set to 128 samples and is equal to the blocksize in ProcessBlock().

If i do like you say, and if I keep a 1024-sample FFT size, my buffer will be ready to be processed only when full.

So that means I have to set my plugin latency to 896 (1024-128) samples, haven't I ?

Yes, that's true, but AFAIK it's unavoidable if your algorithm needs to work in blocks of a particular size.

Ok, thanks.

The need for latency reporting depends on what the goal is of your FFT. If you use it to display a frequency spectrum, you don't need to report any latency. Only report it when your FFT is used for processing/changing your input signal.