Downsampling,buffering and parallel processing

Hi guys,

I am currently working on a plugin that has a built in guitar tuner. I’ve already tested my algorithm for pitch extraction in Python, and now I’m ready to implement it inside the plugin.

This is what I need to do:

  • Downsample the signal (down to 8kHz) (this is because I don’t need full band to tune a guitar, so I can lower the CPU load working at 8kHz instead of 44/48/96 etc)
  • fill a buffer until I have 1024 samples
  • make the FFT of the buffer, do stuff, make the IFFT, do other stuff and estimate the pitch

Since this is not related to the main signal processing, I would like to do it in a separate thread.

So, my question is: is there something in JUCE that has a similar implementation? if not, are there some “best practice” to follow for this kind of applications? I simply don’t want to reinvent the wheel and I would like to rely on robust and already tested code. For example for the FFT/IFFT part I know there is a dedicated class in JUCE, and I’ve also already used it in the past.

The two main problems now are the downsampling procedure (maybe is it possibile to use only the processSampleDown part of the oversampling class?) and the creation/management of the buffer and the separate thread.

Thank you!

As the processing that you describe seems relatively lightweight compared to what other plugins do on the audio thread I would try doing it all on the audio thread first and only even start thinking about offloading processing after profiling clearly revealed that doing it on the audio thread is a real performance bottleneck. Multithreaded rendering comes with a lot of challenges.

There is no downsampling implementation in the juce codebase that I am aware of that suits your use case. But combined with the fixed power of two blocksize that you need, it shouldn‘t be too hard to implement a quick solution yourself.

  • Add a suitable steep IIR or FIR filter that continously filters the input signal at somewhere below 4kHz.
  • The filtered output is written to a pre allocated buffer sized downsamplingFactor * targetFFTSize.
  • Once the buffer is filled, perform a strided copy from it to the FFT input buffer, dropping samples according to your resampling factor
  • Do whatever processing you want to do in the 8kHz domain

Of course this could be optimised further but I’d propos not to start optimising before you experience actual performance issues :wink:

1 Like

Ok, super clear, thank you!

You do know that the FFT block size defines the lowest frequency handled by the algorithm ?
Hence for best performance choose the block size so that the lowest considered pitch can be safely detected, but not bigger. (Usually the FFT blocks are not related to the sample blocks and you need to repack the samples (which is no additional effort when you do resampling).
BTW there is an allegorithm called “Real FFT” that is as fast as normal FFT with half the block size. (i.e. nearly 3 times speed up)
-Michael

yeah, of course I know. As I said the algorithm has already been tested with positive results in Python, now I’m only translating it in C++. If I remember right the FFT module in JUCE has the possibility to make the “Real FFT”.

The juce::dsp::FFT has a real to complex forward and complex to real inverse transform modes, they are named performRealOnlyForwardTransform and performRealOnlyInverseTransform and work in place on an input/output buffer that needs to be reinterpreted to std::complex in the frequency domain. So from this point of view it has a “RealFFT” mode. However, there are also multiple C, C++ and Rust libraries out there that are actually named “RealFFT” or “FFT-Real”, so I’m not sure if @mschnell refers to one of these specific implementations.

In any case, if you care about performance, you might want to care about the actual FFT implementation used behind the scenes. The juce::dsp::FFT automatically picks a suitable FFT implementation at runtime, depending on what libraries are available on your system. If you don’t supply anything special, this will be the Apple Accelerate framework FFT on macOS and iOS and the JUCE own fallback implementation on all other platforms, which will lead to a good performance on Intel Macs, a medicore performance on M1 Macs and iOS devices and a rather bad performance on all other systems. You can also instruct it to use the Intel IPP FFT if you link to it which will give you a really good performance on all Intel CPUs on Windows, Linux and macOS but not on modern ARM-equipped M1 macs and mobile devices, or FFTW which will likely give you a good performance on all systems.

Besides FFTW – which is expensive or requires you to go GPLv3 – there is no high performance one-fits-all FFTW library. We went with writing our own FFT library wrapper, similar to the JUCE one, that wraps IPP for Intel CPUs and pffft for ARM.

If I were you, I’d probably start with the JUCE FFT and optionally tweak performance by swapping out the FFT implementation once you got the rest up and running.

Apple Accelerate leads to a mediocre performance on iOS devices? I was under the impression that it was the best option. What better alternatives are there?

I was referring to the benchmark results for ARM FFTs posted here Comparing FFT engines - #8 by yairadix

My statement was written under the assumption that Apple will probably use similar implementations for both iOS devices with ARM CPU and macs with ARM CPU. I might be wrong here though, did anyone compare the Accelerate performance between different ARM equipped Apple devices? And also note that “medicore” might be a bit hard – it’s still magnitudes faster than the JUCE fallback implementation :wink:

Ah, got it! Very interesting.

By the way, so the original poster doesn’t worry too much: depending on what one is doing, the fallback engine might be more than sufficient. For example, I have no problems displaying a spectrum analysis at 60fps using a quite large FFT order on a Samsung Galaxy Alpha from 2014. So a tuner is really nothing to worry about!

Ok, clear! I know that there are faster implementation, like FFTW, but at the moment I would like to avoid to use external libraries and have to deal with their licenses. A Tuner is a very simple project and I’m confident that the FFT in JUCE will be enough. Thank you!