Memory allocation in JUCE FFT using MKL & FFTW

While running some tests for memory allocations I started getting warnings that juce::dsp::FFT allocates memory when the FFT length is long enough (4096 samples or longer). Does this mean we should not be activating MKL if we’re planning on taking FFTs of longer buffers?

Some minimal code for testing is below. Basically, any order value >= 12 will end up flagging a memory allocation.

I tested with IPP and didn’t hit any memory allocations. This makes sense because IPP takes a workspace buffer.

Anyone else running into this? Am I getting a false positive somehow when using _CrtSetAllocHook() and really everything is fine? The same thing happens when using other MKL functions for linear algebra when the matrices are large enough - no allocation for small problems, allocation for larger ones.

Is this actually a problem in practice? Intel MKL’s FFT is very fast and I have never noticed a problem, but it does break a cardinal rule of real-time coding that makes me nervous about using it…

#include <JuceHeader.h>

bool flagAllocs = false;

int __cdecl MyAllocHook(int nAllocType, void* pvData, size_t nSize, int nBlockUse, long lRequest, const unsigned char* szFileName, int nLine)
{
    if (!flagAllocs)
        return(TRUE);

    if (nBlockUse == _CRT_BLOCK)   // Ignore internal C runtime library allocations
        return(TRUE);

    std::cout << "Unwanted malloc/free/realloc!" << std::endl;

    return(TRUE);         // Allow the memory operation to proceed
}

//==============================================================================
int main (int argc, char* argv[])
{
    _CrtSetAllocHook(MyAllocHook);

    int order = 12;
    juce::dsp::FFT fft(order);

    std::vector<float> inout(2 * fft.getSize());

    flagAllocs = true;
    fft.performRealOnlyForwardTransform(inout.data());

    _CrtSetAllocHook(NULL);
}

EDIT: Just tested and static linking of FFTW in the Projucer dsp module settings also flags a memory allocation for order >= 12…

1 Like

More investigation:

The allocation only seems to happen the first time the MKL functions are called per thread, so as long as the host only ever runs the plugin on one thread there is not really a big problem. However, if the first sound out of your plugin is a glitch that’s not great either.

The bigger problem comes if the host switches which thread processBlock() gets called on each switch. In Reaper, this means there is constantly allocation happening on the audio thread. At least according to _CrtSetAllocHook.

Something similar is happening on Mac. vDSP_vsmul is allocating on its first call as well for order >= 8. I haven’t tested yet if it happens on each thread change.

I’m guess with how widely used Accelerate is in plugins this is probably not a real problem. Am I being too cautious?

1 Like