ConvolutionEngine implementation question/improvement?

I’m currently studying the juce-Convolution, currently only the internal ConvolutionEngine-struct.

The algorithm has a special adaption to user bigger chunks of impulse data (relative to input data) when used with smaller buffer sizes "blockSize > 128”. It uses then FFT-blocks which are 4x bigger than the block-size instead of 2x bigger.

In this case more overlap data is created, which is preserved for later operations.

I see theoretical benefit from it, but only if the impulse-response is at least four times bigger than the block-size (in my simplified model I assume that FFT-calculation time is proportional to the FFT-size, and the engine is fed with optimal aligned input-blocks)

fftSize = BlockSize * 2
processingTime = roundup(IRSize / BlockSize) * fftSize

Vs adapted

fftSize = BlockSize * 4
processingTime = roundup(IRSize / (fftSize-blockSize)) * fftSize

Imho the condition for this adaption should not dependent on the size of the block "blockSize > 128”, instead it should be “irsamples > (blocksize * 4)

These are theoretical thoughts, probably only @reuk and @fr810 can say something about it, because they wrote the engine :smiley:

struct ConvolutionEngine
{
    ConvolutionEngine (const float* samples,
                       size_t numSamples,
                       size_t maxBlockSize)
        : blockSize ((size_t) nextPowerOfTwo ((int) maxBlockSize)),
          fftSize (blockSize > 128 ? 2 * blockSize : 4 * blockSize),
          fftObject (std::make_unique<FFT> (roundToInt (std::log2 (fftSize)))),
          numSegments (numSamples / (fftSize - blockSize) + 1u),
          numInputSegments ((blockSize > 128 ? numSegments : 3 * numSegments)),

PS:
Is it planned to transform the whole engine to use templates to support double-precision in the near future?

I wasn’t involved with the construction of the engine - I just rewrote some of the setup code in an effort to make it a bit more thread-safe.

@IvanC might be able to provide more insight.

1 Like

Hello!

If I remember well, I came up with this condition because the FFT calculus is not the slowest operation in this context, but the multiplications side. Having a higher FFT size with fftSize = blockSize*4 when blockSize is low and IR size high enough reduces significantly the number of convolutions you have to do per block.

However, I think even better conditions for this could be found with some benchmarking. I think one of my own engines uses something like:
fftSize (blockSize <= 128 && numSamples >= 8192 ? 4 * blockSize : 2 * blockSize)

instead of :
fftSize (blockSize > 128 ? 2 * blockSize : 4 * blockSize)

(and same condition for numInputSegments)

For double precision support, I guess it might not be something complicated to do, but the JUCE wrapper would have to check additionally if the wrapped FFT libraries support properly double precision.

2 Likes

Thanks for the insight! :slight_smile:

1 Like