Juce FFT vs FFTW benchmarking?


#1

Just curious if anyone has any experience with using both and has any rough idea of how much faster FFTW is than the JUCE FFT ? (assuming this since the JUCE implementation mentions it’s not optimized for speed).

Thanks!


#2

FFTW is way faster, but as it’s under the GPL you cannot use it for commercial use, unless you buy the non-free license which is…too expensive. You might wanna look into the Intel IPP (https://software.intel.com/en-us/intel-ipp). They have a free Community License.


#3

I have high hopes for the upcoming JUCE DSP module having a truly fast FFT so we don’t have to pay for FFTW or chase other solutions… in the meantime we have to use other things. Like @Jan_Schwers said IPP is a good choice if you’re targeting only x86, but IMO even the community license is still too restrictive due to the fact that every dev has to have an account and library can’t be put in a shared repo etc. etc.

I’ve been on a similar hunt, my favorite looking thus far is FFTS, which has a permissive license and SSE/AVX/NEON acceleration.


#4

What about KissFFT ? It’s under the BSD licence


#5

You might have noticed, that Intel IPP which has support for pretty fast FFT and complex vector math is now free (as in beer), if you don’t require their support. So I’d hope, that sooner or later juce FFT stuff will be a wrapper around IPP on Windows and Linux and around vDSP on Mac and iOS. Then the only problem that remains is ARM based Android and Embedded Linux. Not sure, what one would use there.

Best,
Ben


#6

Yeah I’m currently using kiss inside a real-time audio classification module I’m putting together.

It would be great to remove the need for that explicit dependency when the JUCE FFT is given some magic whenever that DSP module is appearing.


#7

Just for the sake of removing the dependancy or because you found out that KissFFT is slower than Intel’s FFT ?


#8

I was more thinking then the audio classification module/lib would be friendlier to other JUCE users (particularly newbies), by using JUCE’s own FFT implementation.

I’m using kiss at the moment purely due to the BSD license, decent enough speed and my own familiarity with it.

I haven’t taken the time to do any bench marking between the various packages out there.


#9

Our implementation is basically the same algorithm as kissFFT. I’d be very surprised if kissFFT was any quicker, so seems kind of pointless to go to the trouble of adding a 3rd party library unless you’ve actually benchmarked and found that there’s a good reason to do so.

And certainly our experience was that modern vectorising compiler optimisations get close to making a pure C++ algorithm as good as an assembly-language one. The intel FFT is probably a bit better because I’m sure they’ll use some sneaky CPU-specific tricks, but in many real-world cases none of this matters, since the bottleneck will be memory/cache access rather than pure CPU number-crunching. TL;DR: Don’t waste your time prematurely optimising unless you can measure a problem in your FFT and then measure an improvement by swapping the library!


#10

HI Jules,

Fair enough. Thanks for the info. To be honest I’ve got it in there because at the moment one of the libraries I’m using uses KissFFT internally so I kept it in there as I got comfortable using it. Looking like I’m going to be replacing the library with my own routines anyways so I’ll go with the JUCE FFT from there on!


#11

When looking at the implementation of performRealOnlyForwardTransform, it looks like it just prepares a buffer for an equivalent perform call.

However, in kiss_fft’s case the real-only-transform is twice as fast, according to its “TIPS” file:

Also, kiss’s fftr returns half the spectrum (the second half is usually not needed and can be trivially derived from the first half) so it uses less memory and is probably more cache-efficient…


#12

Has anyone done any benchmarking to compare, e.g. JUCE FFT / KissFFT / FFTW / FFTS?


#13

Yes ! Intel FFT > vDSP / Accelerate > FFTS > FFTW > PFFFT > FFTReal > Ooura FFT >>> KissFFT + JUCE FFT in short


#14

Did you check the FFT of Intel IPP or Intel MKL?


#15

In some tests we did, Ooura FFT was faster than FFTReal. In our tests this also depended on Windows/OS X and whether using floats or doubles.


#16

Its also important which sizes has been checked, i think the differences are not too big with small ffts, because of cpu-memory cache.


#17

We tested various sizes (32 - 1048576) as 32- and 64 bit application on various machines. Generally, the difference increased with the size. I’d say for most audio applications the size will be between 256 and 32768.

The tests also included a convolution (full processing: Y1=FFT(X1), Y2=FFT(X2), Y1=Y1*Y2, X1=IFFT(Y1), X1=rescale(X1)) which was identical for FFTReal and Ooura. The data was initialized randomly before the benchmark begun


#18

Did you benchmark real-transforms? It would be very surprising if JUCE’s FFT matched KissFFT there.


#19

I should probably redo my benchmarking tests at some point and post the results since I don’t remember some of tne details :wink:


#20

So, from how things look atm, on iOS/macOS its best to use vDSP, on Intel platforms IPP, and other platforms e.g. Android arm either FFTS or PFFFT (avoiding the bloat and license issues of FFTW).

In reality, it might not make enough of a difference to warrant #ifdefs for each platform, and should be easier with one library that supports all major platforms. PFFFT looks good but [does not seem to compile for Android] (https://bitbucket.org/jpommier/pffft/pull-requests/1/introduce-set_ps1-macro/diff) (without modification), FFTS looks tricky to include and not recently supported by its original author… KissFFT seems the simplest to use and decent enough performance.
It may be that for my needs, the JUCE implementation will be good enough :slight_smile: