Recently I have decided to swtich my FFT/SIMD library to Google Highway in order to obtain the dark power of raw/semi-raw SIMD intrinsics. However, there is no FFT library built with Google Highway. Therefore, I decide to write one myself, mainly following ideas from OTFFT.
If you want to use it, see GitHub - ZL-Audio/zldsp_fft: A FFT library based on Google Highway · GitHub , it is under Apache-2.0 license. For now, it supports double/float C2C/R2C power-of-two FFT, targeting SSE2/SSE4/AVX2/NEON. AVX512/SVE are not supported as I do not have machines to develop/test ![]()
If you want to benchmark or contribute (especially regarding performance on x86-64), see GitHub - ZL-Audio/zldsp_fft_develop: FFT implementation and analysis · GitHub , it is also under Apache-2.0 license (EXCEPT FOR other FFT libraries in this repo!).
___
Now, the question comes down to the performance. Here are the throughput of real-to-complex transform (higher the better).
Disclaimer: I have tried my best to make the benchmark fair. However, there might be mistakes regarding the set-up of other FFT libraries. There might also be fluctuations in hardware, though they should get covered by the benchmark framework.
Feel free to suggest any ideas
perhaps it is the best I can get with Stockham?




