Lowest-latency real-time pitch detection

p-i · June 8, 2022, 10:19am

My user sings into the mic. iPhone/macBook mic, or headset mic, I suppose. I need to get out the f0 of the note they are singing.

I need to keep it ultra-low latency.

What options are available?

I will attempt to maintain a summary by editing this initial post upon replies.

So far I’m aware of:

GitHub - adamski/pitch_detector: JUCE module for pitch estimation
Limes/PitchDetector.h at main · benthevining/Limes · GitHub
(slightly modified) implementation of the YIN pitch detection algorithm
This is doing autocorrelation in the time domain, so the latency is 2 times the period of the minimum detectable frequency
NOTE: It requires building the whole library, so isn’t exactly a “standalone” C++ component.
helmholtz finds the pitch (Thanks to aamf)
There’s a .zip at the bottom containing a C++ package. I can’t find this code on GitHub.
There’s a corresponding thread at \[helmholtz~\], guess what it is... | PURE DATA forum~

jimc · June 8, 2022, 12:48pm

I think it depends how low the lowest frequency you want to detect is I think, and how much noise you are prepared to tolerate in the pitch data…?

Not sure what the state of the art is though!

p-i · June 8, 2022, 3:18pm

I think E2 is the lowest I’d expect anyone to sing.
Users will be singing into their mobile phones or a laptop mic (at worst).
I’m tempted to run with creating a karplus-strong ring-resonator for each note, as I’ve implemented it before. But if I can find something that works cleanly out of the box, I’d much rather go with that.

The important thing is that when the user sings, they will get audio feedback. And if the latency is too high, it’s gona make the brain glitch. So I want to optimize on buffer-size – I think that’s the bottleneck.

stenzel · June 8, 2022, 11:03pm

I think the theoretical lowest required time to detect pitch is 2 periods, so roughly 20 ms for 100 Hz, 2 ms for 1000 Hz. A low latency pitch detection that comes close to this is part of my autotune package, unfortunately this is not free software.

p-i · June 9, 2022, 8:46am

If you link to your product, this will improve this thread as a resource.

PeterRoos · June 9, 2022, 8:54am

Google for the Goertzel algorithm. Also used for resonance detection in for instance car components. Has nice real-time properties, hence its use in embedded systems.

p-i · June 9, 2022, 9:02am

I’ve edited the OP to make my use-case more clear.

I’ve looked into various approaches:

FFT (it’s possible to get exact spectrals by finding locally peaking bins and using their rate of rotation – e.g. if your bin is @100Hz and you sing @101Hz, every second the bin makes one revolution.
Goertzel filters (not sure how useful these are for harmonic tones)
KarplusStrong Ring-Resonator for each note.

My favourite is the third. However it’s tricky to implement exactly, as the number of samples in the ring would almost-always be non-integer. It’s fudgable.

static-cats · June 9, 2022, 9:52am

Hi,

My two cents.

FFT is generally a bad candidate to have a precise f0 exactly because of what you describe, you’d need quite a lot of extra processing to refine the picked peak.
As far as i understand, Goertzel filters seem more appropriate to detect the presence and measure the amplitude of some sines you suspect the presence beforehand.
You have several “autocorrelation related” methods like YIN as implemented in the github repo you mention in your OP. Couple of years ago, for pitch detection in a commercial plugin, i implemented the method described in this thesis (look chapter 4) which proposes some improvements of autocorrelation methods. I did not properly benchmark etc, but was quite satisfied overall with the quality of the detection on voice, and it’s relatively straightforward to understand and implement too which is a plus. There’s also a good state of the art of pitch detection methods in the thesis (though it’s been published in 2008, so there might be new fancy stuff).

Hope this helps.

aamf · June 9, 2022, 11:59am

The Pure Data object [helmholtz~] implements that last method, if I understand correctly. It’s indeed very good and Pure Data makes it very easy to test it out. The source code is available here: https://www.katjaas.nl/helmholtz/helmholtz.html

jos-ccrma · February 10, 2023, 6:32am

In case anyone is still looking, it is hard to beat the Kalman filter:

lcapozzi · February 10, 2023, 9:34am

Thank you Dr Smith! I tried the AU project, but the f0 estimation looks strange to me. Playing an E2 guitar string should return me an f0 around 82Hz, but here I get around 500Hz. Is there something I’m missing and needs to be adjusted?

Thanks,
Luca

PaulDriessen · February 10, 2023, 4:09pm

the code has bugs:

the nUpdate var defaults to 4
This inlfuecnes the samplerate for the pitchdetection, but this is not calced in!
So you need to add this compensation:

void EKFPitch::prepare(float fs, int bs,int nUpdate){
    
    sampleRate = fs / (float)nUpdate;

the transient/silence detector is just too simple. It is not the fancy one described in the paper, just a simple power function over the last buffer
the initial pitch is crude and also not as fancy as described in the paper.

I left out the initial pitch detector and fixed the nUpdate bug, and then it works correctly.
ekf.resetCovarianceMatrix() needs to be called everytime that the results go out of wack, you can use the silence detector for it, but maybe just triggering on bad f0 results is more efficient.

EDIT:
Allthough it looked promising at first, the whole thing is way too unstable and finnicky.
The cool stuff from the paper is not implemented and the tracking is very insecure.

jcomusic · February 11, 2023, 10:17am

McLeod / MPM outperforms YIN a bit, especially if you need low latency and if you can’t do much post-processing. Paper is called “A smarter way to find pitch” iirc. I’ve had acceptable results incorporating it in a semi-zero-latency WSOLA pitch shifter.

Also, I haven’t heard of the Kalman approach, thanks for the tip!

benvining · February 11, 2023, 6:14pm

MacLeod isn’t incredibly different from YIN, they both use autocorrelation and mainly differ in their postprocessing (“peak-picking”) techniques. Both of them can be accelerated by calculating the autocorrelation using an FFT.

lcapozzi · February 13, 2023, 1:51pm

Moving auto-correlation to FFT wouldn’t increase the overall latency?

jcomusic · February 13, 2023, 2:10pm

The minimum achievable latency depends on the lowest frequency you want to be able to detect. The buffer you’re analysing needs to contain at least one full period of that frequency. For example, 441 samples for 100Hz at a sampling rate of 44100Hz.
Because of that, it doesn’t really matter how you calculate the ACF. FFT is the efficient way to do it, and simple parabola fitting will provide decent resolution for the peaks in the ACF (which are the candidates for picking the “right” fundamental).

I haven’t tried any approaches that are not based around autocorrelation, but from my understanding it’s impossible to reduce latency below the wavelength of the lowest fundamental because the information is just not there. Listen to a 1024 sample snippet of audio and try to figure out the pitch by ear - given how tricky that is, autocorrelation seems a pretty decent method to build upon.

stenzel · February 13, 2023, 3:25pm

This is too pessimistic, the minimum latency for pitch detection is about twice the true period length. Not the lowest.

jcomusic · February 13, 2023, 5:01pm

Isn’t figuring out the true period length the challenge here? You can make assumptions about how long that might be, in the sense that a vocal probably won’t go below 100Hz, and determine your analyze frame size that way.

I don’t see how “one period length” is pessimistic though. You correctly mentioned that in practice, you’ll typically need twice the true period length - which is worse.
So, as a theoretical lower bounds for latency, “one period length” holds - wouldn’t you agree?

stenzel · February 13, 2023, 5:16pm

A single period is not very periodic, I am unable to detect periodicity for a single period.

jcomusic · February 13, 2023, 5:51pm

Ha, yes, that’s correct and it obviously makes sense - two periods are required. Sorry for the confusion. Turns out my post was too optimistic then

Topic		Replies	Views
Pitch recognizer C++ API Useful Tools and Components	25	14554	August 4, 2021
Pitch detection on high tempo Development	0	317	September 26, 2022
Pitch Detection Implementation General JUCE discussion	18	526	June 3, 2025
Pitch detection of guitar signal General JUCE discussion	3	890	October 21, 2023
Working with AudioBuffer to do Pitch Detection on Low Frequencies Audio Plugins	11	2304	March 16, 2020

Lowest-latency real-time pitch detection

Purchase

Discover

Learn

Support

About

Events

Lowest-latency real-time pitch detection

Related topics

Purchase

Discover

Learn

Support

About

Events