Handling Large Audio Blocks Efficiently at Small Buffer Sizes Without Glitches

Hi,

I’m struggling with real-time processing in my audio plugin when the DAW buffer size is small. My plugin uses Elastique for time-stretching, and while everything works fine at 256 samples or larger, buffer sizes below that cause audio glitches and high CPU usage. I suspect the real-time thread is getting overloaded, but I haven’t found a reliable fix.

To reduce processing overhead, I process larger chunks (1024 samples at a time) and store the output, feeding it to the DAW buffer as needed. I’ve tried pre-filling a buffer, using a circular buffer, implementing juce::AbstractFifo, and running a separate thread, but none have eliminated the glitches.

I’m looking for advice on the best way to handle large-block processing while ensuring smooth playback at small buffer sizes. Should I rethink my buffering strategy, or is a separate thread actually the right approach? Any insights or best practices for efficient real-time processing in JUCE would be greatly appreciated!

Thanks in advance!

If you haven’t already, any profiler which supports a timeline view would help identify how long individual calls to your processBlock are actually taking in comparison to the passage of real time. It could help if for example you have some sort of resource contention going on which is making things dramatically worse.

Also, do you report the actual latency of your plug-in correctly? If not, the DAW might have an expectation that your processBlock calls should be faster to match its own lower latency expectations.

The problem with large block processing is that you’ll have lots of processBlock() calls where almost nothing happens (only some copying), then once in a while one where a lot happens. You gather the input block, process it, and then output the first bit of the output block.

When you offload to another thread, you still have to process the data before you can provide an output block. So additional to the latency that your block processing already needs, you need some additional latency so you still have some output data to provide while the worker thread is processing the current block. So you need the algorithmic latency determined by the large block size, plus some processing latency which is roughly the worst-case time it takes to process that block.

To add to the previous comments, depending on your algorithm, there may be a way to split the work over multiple smaller blocks rather than trying to it all at once. This avoids the sudden peaks in processing.

Thanks for the feedback, everyone!

I’ve profiled processing times and didn’t see any major spikes in processBlock(), but since I’m offloading work to a background thread now, I wonder if some thread contention is causing delays in the data handoff.

Currently, I’m not explicitly reporting plugin latency to the DAW, but given my background processing approach, it seems like I should be.

I’ve implemented a FIFO-based solution to offload processing to a background thread, which has greatly improved playback stability and reduced CPU usage. However, I’m now facing an issue that I suspect is a race condition.

Here’s the setup:

  • The audio thread writes input data into a lock-free FIFO and reads the processed output.
  • A background thread pulls from the FIFO, applies time-stretching with Elastique, and pushes the processed data back.
  • I process 1024-sample blocks, while the DAW buffer size is 32 samples (or user-defined).

The FIFO approach has resolved previous glitches, but now I’m seeing assertion failures in Elastique. Since Elastique doesn’t include debugging symbols, I can’t debug it directly. The strange thing is that when I comment out the fifo.read() call in getNextAudioBlock(), the assertion failure stops, which makes me think that there’s a race condition between the audio and background threads.

I’ve posted my relavent code below. If anyone has insights or has worked with similar issues, I’d really appreciate the advice. Thanks again!

void run()
{
      while (!threadShouldExit())
      {
          if (fifo.getFreeSpace() >= 1024)
          {
              AudioSampleBuffer inputBuffer;
              const int numOutputSamplesThisBlock = 1024;
              const int numInputSamplesThisBlock = elastique->GetFramesNeeded(numOutputSamplesThisBlock);
              
              if (numInputSamplesThisBlock != -1)
              {
                  fillInputBuffer(inputBuffer);
  
                  auto write = fifo.write(numOutputSamplesThisBlock);
                  
                  const float* inputs[2] = { inputBuffer.getReadPointer(0, 0),
                                             inputBuffer.getReadPointer(1, 0) };
                  float* outputs[2] = { fifoBuffer.getWritePointer(0, write.startIndex1),
                                        fifoBuffer.getWritePointer(1, write.startIndex1) };
                  
                  elastique->ProcessData((float**)inputs, numInputSamplesThisBlock, (float**)outputs);
              }
          }
  
          wait(1);
      }
  }

    void getNextAudioBlock(const AudioSourceChannelInfo& bufferToFill)
    {
          if (fifo.getNumReady() < bufferToFill.buffer->getNumSamples())
          {
              return;
          }
  
          int numSamplesToRead = jmin(bufferToFill.buffer->getNumSamples(), fifo.getNumReady());
          auto read = fifo.read(numSamplesToRead);
  
          for (int ch = 0; ch < bufferToFill.buffer->getNumChannels(); ++ch)
          {
              bufferToFill.buffer->copyFrom(ch, 0, fifoBuffer, ch, read.startIndex1, read.blockSize1);
              if (read.startIndex2 > 0)
                  bufferToFill.buffer->copyFrom(ch, read.startIndex1, fifoBuffer, ch, read.startIndex2, read.blockSize2);
          }
    }

Note that from the FIFO, you always get two indexes and block sizes, which together give you the number of samples requested. This is to handle the buffer boundaries. Most of the time, the second block size will be 0 as everything fits in one go. But when you get to the end of the buffer, you’ll have to process both.

In your code, you just give Elastique the first start index without checking the block sizes. So when you’re towards the end of the buffer, Elastique will write beyond the bounds of your FIFO and then anything can happen (undefined behavior).

Also I’m not sure how you’re handling the input buffer. What you really need is two FIFOs, one for the input, one for the output. Then you should also have separate process buffers that you copy the input samples into, and the output samples out of, so they’re always contiguous for Elastique to work on.

One more thing: in getNextAudioBlock() you check if enough samples are ready and just return if not. Have you checked how often that happens? Because that’s what I mean by process latency. The thread has to have enough time to process data and get the output data ready. Otherwise the output FIFO will run dry. Even when it does, you maybe shouldn’t just return, as the function is expected to fill the buffer with something. You should make sure it contains at least zeros.

If your DAW’s latency is 32 samples at 44100 khz, and it isn’t aware of your 1024 sample latency, it is going to generally expect your worst case to be about 32/44100 = 0.1ms. Generally speaking you can’t rely on a non-realtime operating system to coordinate between threads at sub millisecond timing. If your worker thread is in a wait state, just the signaling to wake the thread and then signal completion is going to typically be more than 0.1ms.

You were spot on about the FIFO handling—I wasn’t properly dealing with the wraparound case, which could’ve caused out-of-bounds writes. I’ve now fixed that by correctly handling both block sizes from fifo.write(), so Elastique always writes within the buffer limits.

I also discovered that there was a mismatch between the expected and actual output samples from Elastique—always off by one for some reason. I ended up adjusting my intermediate buffer size to numOutputSamplesThisBlock + 1, and that completely solved the issue.

On process latency—good catch on getNextAudioBlock(). I checked, and it rarely runs dry, but I’ve now made sure it outputs silence if no data is available, just in case.

Now I just need to figure out how to calculate the processing latency so I can report it correctly to the DAW.

Really appreciate the feedback!

What you report to the DAW only matters for latency correction, it has no effect on your real latency.

The thing is this: each process call you write to the input fifo and read from the output fifo, and trigger your elastique thread every now and then (when you’ve go enough in your input fifo). If you just run this without any extra latency, the output fifo will go empty the same moment the input is full and your thread starts processing. With the very next process call, you will need to supply that output data. So you basically have the same short time to compute the large block as you would if you didn’t have the extra thread (give or take a bit).

If you make sure there’s a bit more left in the output fifo, you can give the thread some more time to process. To do that, you can make the output fifo larger and stuff it with some more zeros at initialization.

Thanks for the explanation! That makes a lot of sense.

I’ve got a thread pool running for each track, and Elastique processing is working smoothly with good playback quality. But now I’m running into an issue where REAPER seems to need a much higher reported latency than other DAWs for everything to stay in sync. In every other DAW, reporting 4192 + bufferSize works perfectly, but in REAPER, the audio is way off unless I bump up the latency even more.

I’m guessing this might have something to do with how different DAWs handle latency compensation or buffer processing, but I’m not totally sure. Has anyone else noticed something similar, or have any ideas why REAPER might need a higher reported latency?

The reported latency needs to be whatever your real latency is. Is 4192 + bufferSize the real latency of your algorithm?

It could be that for some reason your scheduling is off in Reaper and you have a higher latency than expected. You can test that by reporting zero latency, running some audio through it and checking the delay of the result.

Thanks hugoderwolf!

In every DAW I’ve tested except for REAPER, the latency I report is handled correctly. However, in REAPER, the same reported value doesn’t work as expected. When I try reporting 0 latency in REAPER, the timing sounds correct—matching what I hear in other DAWs—but using the actual latency value that works elsewhere results in incorrect timing. It seems as though REAPER just doesn’t handle the latency the same way other DAWs do.