JUCE's FFT wrappers are slow and bug-prone

It’s great to have nice wrappers for Intel’s/Apple’s/FFTW’s FFTs.
But sadly they are all slower by non-negligible factors than the original providers :frowning:
This is due to all of the wrappers doing the following in their performRealOnlyForwardTransform methods:

    for (; i < size; ++i)
        out[i] = std::conj (out[size - i]);
    for (auto i = size >> 1; i < size; ++i)
        out[i] = std::conj (out[size - i]);
    for (auto i = size >> 1; i < size; ++i)
        out[i] = std::conj (out[size - i]);

This is to provide the odd interface that juce chose to have for the real-transform - provide the negative frequencies which are redundant and should be ignored.
It’s no coincidence that all other FFT libraries chose to have a different API which doesn’t provide redundant results, as it has better performance as well as being less bug-prone for users. Currently users using performRealOnlyInverseTransform could get confused becuase either using the wrappers the second half of the input array is ignored or using juce’s fallback FFT they could get weird results which are not the inverse fft of the data provided.

(note that the real-ffts are the most frequently used transforms, for example used for implementing convolutions)


OK there is fix for this on develop with commit e61292f.

If anybody can submit a patch on how to modify the fallback engine to only write (real -> complex) or read (complex -> read) half of the coefficients that would be superb. Then I could also drop the double size memory requirement.