FFTS -- Fastest FFT implementation, and Free/BSD License

Hi Martin,

This is explained in the pffft.h header: if you are using it for convolution then the ordering is not important, but if you care about the ordering there is the pffft_transform_ordered function for that

Oh I see, I thought it was in split format by default and the pffft_transform_ordered() function was just to interleave it, rather than it being in some "arbitrary" order. I need it in split/packed format (same as vDSP/FFTReal). I guess I could unpick pffft_zreorder() and figure out how to order it myself.  

Intel and $$$ looks the way to go on Windows and vDSP on the Mac.

Thanks

IPP is freely available since a few months (its called Community licencing) 

Great - I'm seeing IPP with the exact same performance as vDSP.

1 Like

Howdy, would you prefer the FFT of Intel IPP or the MKL library?

According to this document, MKL is way to go for audio-applications, but MKL seems to be more widely used.

https://software.intel.com/en-us/articles/mkl-ipp-choosing-an-fft

We had great difficulty with FFTS bugs on Android, eventually abandoning it in favour of KissFFT. We lost Several developer weeks to this. So be warned. The source code is impenetrable assembly and the original creator has quit.

KissFFT appears to be a very sensible choice, epitomising the JUCE philosophy (KISS) as I see it.

How’s the speed? I was always under the impression KissFFT was the slowest of all the major implementations.

We use KissFFT & IPP.
IPP is providing HUGE performance boost.

If your targets are desktop/laptops (x86) devices IPP is the way to go.

a little OT, anyone tried it out on AMD machines?
They all compatible with same SIMD/ISA but still don’t have an AMD machine to test with. would be interesting.

Has anyone got pffft to work?

I’m now trying to use
pffft_transform_ordered
as a replacement for another implementation, and while the forward transform gives the values I expect, the inverse doesn’t…

In contrast, when using intel MKL (with DFTI_SINGLE, DFTI_COMPLEX) both forward and back i’m getting the expected results, also the same with our own unoptimised fft implementation. So its not an issue with my data. Presumably it needs to be arranged or scaled differently, but I have no idea how…

Also the data is already 16 byte aligned correctly