New JUCE DSP FFT - Limitations to fftw functionality

PluginPenguin · July 27, 2017, 3:29pm

So, I started to roll out my own fftw wrapper module this morning and now I got the message, that the new DSP module is available. Had a quick overview and saw that the new FFT class has the possibility to act as an fftw wrapper, among some other implementations.

While this is a VERRY nice thing, I’m missing some options to really use the horsepower of fftw.

Normally, an fftw transformation is initiated with some call like

plan = fftwf_plan_dft_r2c (n, inArray, outArray, FFTW_ESTIMATE);

where there is the option to pass various other flags like FFTW_MEASURE, FFTW_PATIENT & FFTW_EXHAUSTIVE. While FFTW_ESTIMATE just estimates which arithmetic tricks it should use to perform the fft, the other flags instruct fftw to try out some different options on how to perform the fft and measure which one is the fastest on the given hardware - which is one of the main features that makes fftw that fast.
Furthermore, fftw offers some functionality to perform perfectly aligned memory allocation, so that a maximum number of SIMD instructions could be used, another feature that makes fftw fast.

However in the constructor of the JUCE DSP module’s fftw wrapper functions I find

c2cForward = fftw.plan_dft_fftw (n, in.getData(), out.getData(), -1, unaligned | estimate);
c2cInverse = fftw.plan_dft_fftw (n, in.getData(), out.getData(), +1, unaligned | estimate);

r2c = fftw.plan_r2c_fftw (n, (float*) in.getData(), in.getData(), unaligned | estimate);
c2r = fftw.plan_c2r_fftw (n, in.getData(), (float*) in.getData(), unaligned | estimate);

which basically creates four fftw plans for the “worst” options that could be chosen. So under these circumstances, I don’t think fftw offers any great benefits (correct me if I’m wrong).

So while the DSP module just came out, are there any plans of adding the possibility to maybe simply pass an externally generated fftw plan to the constructor, that makes use of the extended fftw features? I assume this shouldn’t be that tricky?

And please don’t get me wrong, I really like the new module and can’t wait to get my hands on it

matthieu-brucher · July 27, 2017, 3:33pm

The question I’m asking is why would you go for FFTW when IPP/MKL is free and doesn’t have a license issue?

PluginPenguin · July 27, 2017, 3:44pm

First of all, I never used the intel libraries, so I’m not really informed about their licenses - are they free and might be distributed with an open source software project?
Then second and probably more important thing: FFTW is completely platform independent, so this will be nice for the use case of building my own JUCE-based audio algorithm codebase that might be ported to an embedded linux device based on an arm processor.

matthieu-brucher · July 27, 2017, 3:53pm

They can be distributed in an open source project, but not in a GPL one, obviously!

fr810 · July 27, 2017, 3:54pm

From my benchmarks on a few devices FFTW_MEASURE vs. FFTW_PATIENT/FFTW_EXHAUSTIVE really doesn’t make a huge difference. Maybe just a few percent. But we can add it if you feel strongly about it.

Additionally, the intel mkl library is quite a lot faster than fftw (even when fftw was using FFTW_PATIENT). I admit, I benchmarked this on a single intel machine so who knows how the mkl library will run on amd chips - for example.

IvanC · July 27, 2017, 3:55pm

On Linux, I think FFTW is already there for most of the distributions. So for open source projects it might be a good replacement of JUCE FFT without doing anything but checking an option in the Projucer.

matthieu-brucher · July 27, 2017, 3:57pm

Indeed, that’s almost the only good reason to use it. I use it for ATK CI just because it’s impossible to install Intel packages on top of everything.

PluginPenguin · July 27, 2017, 4:32pm

This might be true, but as I noticed from the Juce code, the juce fftw wrapper is using estimate and unalligned memory, which should make a difference to measure and alligned memory - shouldn’t it?

So if you could implement a feature to pass an external generated plan it would be nice from my point of view.

But I’ll also take a look at the Intel libraries for all x86-only projects!

matthieu-brucher · July 27, 2017, 4:37pm

The problem with aligned memory will probably be seen there as well

fr810 · July 27, 2017, 4:46pm

The unaligned flag is a bit confusing in FFTW and does not mean that the array will be unaligned. Quoting from the fftw docs:

...the unaligned is normally not necessary, the planner automatically detects misaligned arrays. The only use for this flag is if you want to use the new-array execute interface to execute a given plan on a different array that may not be aligned like the original...

We need to use this flag as we do not know the alignment yet when we need to create the plans.

I’m pretty confident that the Intel libraries will have the same runtime detection.

PluginPenguin · July 27, 2017, 5:09pm

But to quote the full text describing the unaligned flag:

FFTW_UNALIGNED specifies that the algorithm may not impose any unusual alignment 
requirements on the input/output arrays (i.e. no SIMD may be used). This  ag is normally not
necessary, since the planner automatically detects misaligned arrays. The only use for this  flag 
is if you want to use the new-array execute interface to execute a given plan on a different array 
that may not be aligned like the original. (Using fftw_ malloc makes this  flag unnecessary even 
then. You can also use fftw_alignment_of to detect whether two arrays are equivalently 
aligned.)

This makes me think that the unaligned flag really avoids fftw to use simd instructions just to be sure that everything works reliable under every possible condition, I don’t think it checks alignment every time the plan is executed. So there might be a lot of unused resources when using this flag. The reasons for you to chose this flag as default are obvious. But if I allocated an aligned array and can make sure to only pass pointers to aligned arrays to the fft, it would avoid using simd for no reasons in this use case. And with the option to use even AVX 512 instruction sets with the last fftw release, this could make a really huge difference.

But - this is just my interpretation of this flag - in the end I don’t really know what’s going on inside fftw…

fr810 · July 27, 2017, 5:28pm

Sorry didn’t want to be misleading. I was also confused by the description of the flag so I stepped into the source code of fftw and it definitly uses avx2 on the input on my machine - so it must somehow check for alignment. But I should probably double check with fftw devs…

IvanC · July 27, 2017, 5:55pm

I’m not sure anybody could say the contrary

PluginPenguin · July 28, 2017, 8:03am

I didn’t expect that. Instead I really have to thank for your quick replies.
I’ll also try to find out which instructions fftw uses at runtime under various conditions when I’ll find some time for that in the next weeks…

Topic		Replies	Views
Juce FFT vs FFTW benchmarking? General JUCE discussion	24	6434	September 14, 2020
Using FFTW in juce Useful Tools and Components	3	219	December 22, 2024
Comparing FFT engines General JUCE discussion	22	5503	February 25, 2023
FFTW execute in real-time callback General JUCE discussion	1	620	December 3, 2020
FFTS -- Fastest FFT implementation, and Free/BSD License General JUCE discussion	28	13656	June 11, 2021

New JUCE DSP FFT - Limitations to fftw functionality

Purchase

Discover

Learn

Support

About

Events

New JUCE DSP FFT - Limitations to fftw functionality

Related topics

Purchase

Discover

Learn

Support

About

Events