ConvolutionEngine implementation question/improvement?

I’m currently studying the juce-Convolution, currently only the internal ConvolutionEngine-struct.

The algorithm has a special adaption to user bigger chunks of impulse data (relative to input data) when used with smaller buffer sizes "blockSize > 128”. It uses then FFT-blocks which are 4x bigger than the block-size instead of 2x bigger.

In this case more overlap data is created, which is preserved for later operations.

I see theoretical benefit from it, but only if the impulse-response is at least four times bigger than the block-size (in my simplified model I assume that FFT-calculation time is proportional to the FFT-size, and the engine is fed with optimal aligned input-blocks)

fftSize = BlockSize * 2
processingTime = roundup(IRSize / BlockSize) * fftSize

Vs adapted

fftSize = BlockSize * 4
processingTime = roundup(IRSize / (fftSize-blockSize)) * fftSize

Imho the condition for this adaption should not dependent on the size of the block "blockSize > 128”, instead it should be “irsamples > (blocksize * 4)

These are theoretical thoughts, probably only @reuk and @fr810 can say something about it, because they wrote the engine :smiley:

struct ConvolutionEngine
    ConvolutionEngine (const float* samples,
                       size_t numSamples,
                       size_t maxBlockSize)
        : blockSize ((size_t) nextPowerOfTwo ((int) maxBlockSize)),
          fftSize (blockSize > 128 ? 2 * blockSize : 4 * blockSize),
          fftObject (std::make_unique<FFT> (roundToInt (std::log2 (fftSize)))),
          numSegments (numSamples / (fftSize - blockSize) + 1u),
          numInputSegments ((blockSize > 128 ? numSegments : 3 * numSegments)),

Is it planned to transform the whole engine to use templates to support double-precision in the near future?

I wasn’t involved with the construction of the engine - I just rewrote some of the setup code in an effort to make it a bit more thread-safe.

@IvanC might be able to provide more insight.

1 Like


If I remember well, I came up with this condition because the FFT calculus is not the slowest operation in this context, but the multiplications side. Having a higher FFT size with fftSize = blockSize*4 when blockSize is low and IR size high enough reduces significantly the number of convolutions you have to do per block.

However, I think even better conditions for this could be found with some benchmarking. I think one of my own engines uses something like:
fftSize (blockSize <= 128 && numSamples >= 8192 ? 4 * blockSize : 2 * blockSize)

instead of :
fftSize (blockSize > 128 ? 2 * blockSize : 4 * blockSize)

(and same condition for numInputSegments)

For double precision support, I guess it might not be something complicated to do, but the JUCE wrapper would have to check additionally if the wrapped FFT libraries support properly double precision.


Thanks for the insight! :slight_smile:

1 Like