Two FIFO's to reduce latency?

I’m trying to reduce the latency of my pitch estimator app which uses an FFT. My idea is to use 2 FIFO’s with the second one starting to fill when the first one is half filled, to estimate pitch twice as often. I am not able to find info on if/when this is a good idea. I’ve also read about circular buffers, but I am not clear on when they add benefit. Below is some rough code of my idea. Am I on the right track, or is this a bad idea? I’m fairly new to audio programming and C++.

void pushNextSampleIntoFifo (float sample) noexcept
    // Fill first fifo
    if (fifo1Index == fftSize)
        zeromem (fft1Data, sizeof (fft1Data));
        memcpy (fft1Data, fifo1, sizeof (fifo1));
        fifo1Index = 0;
    fifo1[fifo1Index++] = sample;
    // Fill second fifo   
    if (fifo2Index == fftSize)
        zeromem (fft2Data, sizeof (fft2Data));
        memcpy (fft2Data, fifo2, sizeof (fifo2));
        fifo2Index = 0;
    if (fifo1Index > fftSize/2 || fifo2Index > 0)
        fifo2[fifo2Index++] = sample;

i tried this too. it works. you get pitch twice as often. unfortunately you still need a high order on each fft to get it quite precise so it can get rather clunky to process

1 Like

You don’t need two FIFOs, I implemented overlapping buffers (which you should do anyway due to windowing) by

  • take a buffer once available
  • only mark half the buffer size as consumed
  • the read pointer only advances a half block by doing that and you will be triggered after the next half block.
1 Like

I’m having a little trouble wrapping my brain around this. Conceptually, does this accomplish the same thing as my idea, but more efficiently?

Yes, because both FIFOs would store the same signal, wouldn’t they?

1 Like

Here’s an example:

int start1, size1, start2, size2;

mFIFOManager.prepareToRead(mFFTBufferSize, start1, size1, start2, size2);
AudioBuffer<float> buffer(1, mFFTBufferSize);



return buffer;

You’re just telling the FIFO, hey give me x many samples, but track as if I’ve taken y many samples, so you’ll get twice as many callbacks, and they’ll be overlapping, there are a lot of improvements with this method, even more so than just the fact that you get more callbacks.

You should actually checkout @daniel’s EQ – because he does this in there and it’s where I learned about the JUCE fifos – nice one @Daniel :wink:

(try setting your hop size to half the size of your FFT block size to start)

The fifo will track as if you read half as much data as you have and return data ready the size of your FFT buffer twice as often.

In general though – it’s good to note – your conception of how to do this is correct regardless of how you accomplish this – the “concept” of what you’re doing is called a hop size, and this is a more typical way to implement it.


I did this using a circular (ring) buffer. The audio thread pushes samples to the buffer, and a background thread does the calculations on a snapshot of the buffer when requested (triggered by a timer on the message thread).

I’m still learning about this stuff, but what factors would make you choose a ring buffer over the overlapping buffers? They seem very similar to me so far.

The fact that the amount of incoming samples is not necessarily exactly one buffer. With a circular buffer you can always simply appending data, the code for double buffering is a bit more complicated with arbitrary data sizes

1 Like

What @daniel said.

With a circular buffer, you simply add new data when it comes in, its a rolling loop, and you can then sample the contents of the buffer when you need - it can be faster than samples arrive, it can be much slower, depending on your needs. They are independent. Overlapping buffers seem to add more complication without any more benefits, unless I’m missing something in your use case?

1 Like

Well now I’m confused as to the semantics of what we’re talking about lol.

We’re all using ring buffers for everything – the FIFO manager in JUCE isn’t any sort of container, it just helps manage a FIFO ring buffer. It just seems like you’re not actually looping around your buffer @shawnm and are relying on it being a size which is somehow usefully related to your FFT – which we’re saying is not the way to do it.

Then we have this concept of overlapping buffers – I’m guessing @shawnm that you mean this to be 2 separate ring buffers, which are a half window size out of phase which each other (your hop size), which you’re using to get double the FFT callback information, without waiting for a full FFT window size.

Consider this pseudocode:

void pushNextSampleIntoFifo (float sample) noexcept
    fifo1[fifo1Index] = sample;

    If (numDataReady >=  fftSize) {
        // START READ POINT = fifo1Index - fftSize (you may need to wrap about the start & end of the buffer, this is the “Ring Buffer Concept”

    if (fifo1Index >= fifo1Size) {
        fifo1Index = 0;

    // Now were saying half the data is still still ready, so you’ll be ready again at the next fftSize/2 “Hop Size” 
    numDataReady -= fftSize/2

I would say, there really isn’t much wrong with what you’re doing so if this is a PITA than ignore and move on, but if it’s confusing I recommend that you write a delay plugin or something of that sort so you can understand FIFOs better – they’re pretty much universally used in audio and you’ll need them all over the place.

1 Like

My apologies if I’ve added to the confusion. I’m a novice C++ and JUCE developer and I think I’ve used the terms circular(ring) buffer and fifo incorrectly.

Here is my attempt to clear the confusion:

  • I am creating an app that estimates a person’s singing pitch in real time using a pitch estimation algorithm that uses an FFT.

  • The order of the FFT has to be pretty high (11) to accurately estimate the human voice’s pitch. Consequently, there is significant latency as I wait for enough data for the FFT.

  • The pitch estimation is satisfactory, but the latency is not. So I came up the idea in the op to decrease the latency.

  • As I understand, daniel said that my idea will work, but he suggested a more efficient way of doing it.

  • adamski suggested a third way, which is different than daniel’s.

  • As far as I can tell, the main reason that daniel and adamski do this is to use windowing with the goal of getting more accurate data. hdlAudio seems to be the only one who was doing it for the same latency related reason as me.

For now I will stick with my own solution, since it is working. Hopefully I will be able to revisit this and use a better solution once I have time to figure out what you all are talking about :slight_smile:

Thank you all for your time and patience!