Recently I got interested into FFT again
There are some results on the forum, about 5 years ago. So I think it is time to re-do some of them. I only benchmark complex-to-complex power-of-two forward single-precision FFT, but it should somehow be a good representative. The compiler is AppleClang. I might have made some mistakes, especially regarding FFTW on M chip (I have also tried brew install, similar results) and pffft. M chip uses NEON and Intel chip uses AVX2 (which kind of explains why the peak FLOPS are similar because M4 Pro has almost twice cycle speed).
In case someone also wants to try it: GitHub - zsliu98/zlfft: FFT implementation and analysis ยท GitHub



