Hey all - here is a useful bit of code for anyone wanting to get a Frequency Spectrum (Fast Fourier Transform) into their software.

This plugin listens for incoming audio, performs the FFT, finds the peak and tells you what note it hears (see the screenshot, where it displays a note name in the upper left). In VST mode it also outputs a midi stream indicating this note (AU plugs don’t do midi, recall). In the example below you can see the FFT plugins output runs into a piano VST … the idea being here that you whistle a note, the FFT plug determines what note it is, and the piano plays that note along with the whistling.

Eventually I’ll add FFTw support for windows - but not today.

This code is a good example of creating a plugin with Juce (it was adapted from the juceDemoPlugin example project), and also a good example of using the accelerate framework.

Also - Vinn - if you’re out there (when are you not …), I have a plugin I’ll drop here soon that creates a versatile little filter (using your code, naturally) and displays the brickwall chart with an overlay of the FFT after the filter has been applied.

BTW if you use FFTReal (http://ldesoras.free.fr/prod.html) the bin format is similar to Apple’s vDSP (FFTw interleaves real/imag as I remember and doesn’t make the assumption that the DC and Nyquist imag components are zero). It’s not as fast as FFTw but I’m guessing this is due to non vectorised code as it’s almost exactlt 4x slower than vDSP on a Mac and FFTw on Windows.

Well generally speaking, that is because whistling is a fairly sharp, monophonic signal.

Other signals are more complicated (more resonance or overtones) and so they don’t have a single spike the way whistling does. Instead they have a pattern of overtones in the freq spectrum.

You could do some pattern analysis on the FFT though and get closer to identifying pitch for polyphonic signals.

Maybe get really good at it and give melodyne some competition.

[quote=“aaronleese”]Well generally speaking, that is because whistling is a fairly sharp, monophonic signal.

Other signals are more complicated (more resonance or overtones) and so they don’t have a single spike the way whistling does. Instead they have a pattern of overtones in the freq spectrum.
You could do some pattern analysis on the FFT though and get closer to identifying pitch for polyphonic signals.
[/quote]

OK yeah I spent a few minutes playing with a microphone and a frequency spectrum and I totally see what you mean !!
However when you sing/whistle/play a note, there’s a relationship between it’s harmonics frequency : f, 2f , 3f and so on. Every harmonic is a “little peak” (in time domain) so in freq domain, you should have a way to use this info ? Or maybe not ? It’s been at least 7 years I haven’t been doing this kind of maths

Well, Celemony is German. Normally, when the Americans, English and French unite against the Germans, the Germans loose. As a french, I think I’ll start by surrendering

I have no idea of how Celemony does it but that’s a possibility. I also have heard that wavelets could be used for pitch recognition and that they were faster and more accurate than FFT but I have to dig into my own rabit hole because I didn’t investigate that yet

OK yeah I spent a few minutes playing with a microphone and a frequency spectrum and I totally see what you mean !!
However when you sing/whistle/play a note, there’s a relationship between it’s harmonics frequency : f, 2f , 3f and so on. Every harmonic is a “little peak” (in time domain) so in freq domain, you should have a way to use this info ? [/quote]

You can always play with autocorrelation (fourier transform of the power spectrum). The harmonic peaks are essentially periodic in the frequency domain and so taking the fft again will summarise this information. You end up with data representing correlation as a function of time interval. The pitch can then be easily extracted by looking at the (non zero)time interval at which the first main peak occurs. Multiple pitch extraction is possible by looking at peaks at time intervals that are non integer multiples of one another. The autocorrelation function can also be normalised by the correlation a zero time interval (“dc” bin of the fft) to remove amplitude information. You autocorrelation function will then be bounded to values <=1. This is useful as it gives you a value for the reliability of you pitch estimate; the normalised peak height of the autocorrelation function is a good correlate for the perceptual saliency of the pitch for certain types of stimuli.

Mmmh doesn’t that need at least one peridod of 1/f0 to work ? In that case that wouldn’t really work in real time, given that it would add quite a big lattency for low notes (100hz -> 10ms extra lattency ?)

What I had in mind was something like the “Component Frequency Ratios” in MarC’s document, because by using upper harmonics, you don’t need a whole period.

[quote]For each pair of these partials, the algorithm ﬁnds the “smallest harmonic numbers” that would correspond to a harmonic series with these two partials in it. As an example, if the two partials occurred at 435 Hz and 488 Hz, the smallest harmonic numbers (within a certain threshold) would be 6 and 7, respectively. Each of these harmonic number pairs are then used as a hypothesis for the fundamental frequency of the signal. In the previous example, the pair of partials would correspond
to a hypothesis that the fundamental frequency of the signal is about 70 Hz[/quote]