Multiple DelayLine objects cause crackling distortion when delay times get adjusted

I will try to make this description as short as possible…
Currently, I am working on a project in the field of dynamic binaural rendering where a mono input gets processed by multiple (12 in total) processors, each consisting mainly of a DelayLine and a Convolution object. Further processing (HRTF-related) is done by code from an open source project, which my code is building upon.

To mention the problem first: The adjustment of the delay times of multiple DelayLines causes a crackling distortion in the output. Even if they only get updated once (results in one “crack”). When the delay times get updated continuosly, the distorion becomes heavier.

In order to adjust the respective delay times of each processor, depending on a “listener position” retrieved from an XY-Pad, I check for an updated position within a timer callback defined in the PluginProcessor. The timer runs with an interval of 80ms. If there is an updated position (PluginEditor sets a flag) I open a thread and execute a dedicated function for the adjustments

// check for updated Listener Position and update delay processors and dfp
if (updateDelayProcessorParams) {
    // Lambda function to use in thread call
    auto extrapolateFunction = [this]() {
        extrapolate();
    };

    /* update delays (TOAs), amplitudes*/
    try {
        std::thread extrapolThread(extrapolateFunction);
        extrapolThread.detach();
    }
    catch (const std::exception& exception) {
        std::cout << "Could not create thread" << exception.what() << std::endl;
    }
    /* Reset the flag */
    updateDelayProcessorParams = false;
}

The code for the extrapolate function is this:

void PluginProcessor::extrapolate()
{   
    Vector3D<float> translation(*listenerPositionCurrent - *listenerPositionInit);

    for (auto it = delayProcessors.begin(); it != delayProcessors.end(); ++it)
    {
        auto& dp = *it;
        float x = dp.initCoords.x;
        float y = dp.initCoords.y;
        float z = dp.initCoords.z;
        float aziExtrap, eleExtrap, disExtrap;

        // get index of current DelayProcessor instance
        int dpIndex = std::distance(delayProcessors.begin(), it);
        
        // initial cartesian coordinates already calculated in initDelayProcessors()
        // translate x, y & z by the translation vector above
        x -= translation.x;
        y -= translation.y;
        z -= translation.z;

        // convert shifted Cartesian coordinates back to spherical coordinates
        CoordHelpers::cartesian2sphericalBnf(x, y, z, aziExtrap, eleExtrap, disExtrap);

        // caused output to stutter (probably blocked the audio processing?) 
        // when moving the positioner thumb too much
        //std::lock_guard<std::mutex> lock(extrapolMutex);

        // Update DelayProcessors with new spherical values
        // includes adaption of delay time (toa) and amplitude in setDistance()
        dp.setAzimuth(aziExtrap);
        dp.setElevation(eleExtrap);
        dp.setDistance(disExtrap);

        // Update Diffuse Field Processor delay to match with onset of direct sound
        if (dpIndex == 0)
            dfp.setDelayByDistance(disExtrap);

        // Update BNF spherical coordinates
        binauraliser_setSourceAzi_deg(hBin, dpIndex, aziExtrap);
        binauraliser_setSourceElev_deg(hBin, dpIndex, eleExtrap);
        binauraliserNF_setSourceDist_m(hBin, dpIndex, disExtrap);

        // fetch DelayProcessor's gainFactor in PluginProcessor 
        // and pass on to binauraliser_setSourceGain()
        binauraliser_setSourceGain(hBin, dpIndex, dp.getGainFactor());
    }
    // tell Editor to call pannerView::refreshPanView()
    setRefreshWindow(true);

    return;
}

What I have tried already is opening a thread for the execution, as you have seen in the code above. Also I tried introducing a mutex every time in the setter function for the delay of every processor. Another approach, as seen in the code above as well, but commented out, was introducing a mutex in the extrapolate() function itfself. This caused a weird behavior of its own, the audio sounded “chunked”, probably because the mutex kept locking everything so consistently, that the updated processors could not be accessed by the audio thread (?). I tried increasing the interval of the timer mentioned in the beginning, but to no success.

When I first created a small, basic “proof-of-concept” version with just one DelayLine it worked without the distortions. I could move a slider around and everything sounded fine.

I have read everything related to this topic on this forum, that I could find, but most of the threads were 3-4 years old and dealed mainly with interpolation types. I assume that the dsp::DelayLineInterpolation Types were not present at that time. For my project I am using the Lagrange3rd interpolation type on every delay line that I have set up. Other people suggested using SmoothedValue types for their delay times, but the people to whom this was recommended said that this had not helped.

Has anyone in the meantime dealt with this and has a recommendation about how to resolve these kind of distortions? Is there any kind of efficiency improvement to be done here or in general when dealing with DelayLine objects? If more code or general info needs to be provided I am happy to do so.

I’m not going to comment on the whole implementation you have here, but if I understand correctly you have a DelayLine and you update the delay time without any smoothing or blending, so you’ll probably hear a click (or “crack”) as you’re effectively jumping to a different part of the DelayLine’s audio buffer - like editing together audio without any crossfades. Multiple delay lines all being updated continuously like this would lead to some glitchy outputs.
Using a smoothed value will help, but you might get pitch-shifting style artifacts (which are sometimes cool, but not always wanted).
You could try crossfading between a set of DelayLines with the old times, and a set with the updated times. This might add a bit of overhead, and cause possible phasing artifacts, but it’ll remove any clicks.
Your other option would be to build your own modified DelayLine where updates to the delay time are blended internally in some manner… there’s a few ways to go about this, but it depends which trade-off you’re after.
I hope this is some help.

if (updateDelayProcessorParams) {
    // Lambda function to use in thread call
    do something here
    }
    /* Reset the flag */
    updateDelayProcessorParams = false;
}

This part is incorrect. It may miss a parameter update. Use updateDelayProcessorParams.exchange(false) as the if condition instead. I would also recommend using ParameterAttachment instead of a timer.

void PluginProcessor::extrapolate()

Does this function address the real-time safety issue? For example, dp.setAzimuth(aziExtrap) should be thread-safe & should not block the audio thread.

Hi! Some comments about the approach, ignoring potential concurrency/thread synchronisation issues.

Pop artifacts most likely result from discontinuities in the waveform that are introduced by switching the delay times. If the goal would be a simple delay that can be modulated, I’d recommend to simply interpolate the delay times per sample, but with what you’re doing it might be more involved.

If I understood this correctly, you’re doing spatialisation and binauralisation. If changes in distance are allowed to “pitch” the sound (doppler effect), then modulating the delays is okay, and interpolating the delay times per sample would be okay. Doing it correctly in case of continuous distance-changes and smoothing the resulting time changes in an appropriate way is the challenge then, simple exponential smoothing probably doesn’t cut it.

Secondly, you’re also setting new IRs, based on the angle of incidence. This could introduce discontinuities (clicks and pops) as well. For these updates, blending from old to new might be a good approach. Sadly that means calculating everything twice, seems a little expensive in terms of cpu load…

I now use .exchange(), thanks. What is peculiar about that, is that I can no longer reach a breakpoint within that if-statement… the next reachable breakpoint is within extrapolate().

Also, why would you recommend ParameterAttachment over the timer method? Do you generally think that’s better?

The way I understood it (which is a VERY basic level of understanding), creating a thread around the call of the extrapolate() function made everything in it “not-on-the-audio-thread”. While reading this back this to myself, I can totally sense how naive this thinking is.
The .setAzimuth() function is just a normal setter function with no other tricks to it. My original way of not-blocking-the-audio-thread was using a mutex in extrapolate before the dp.set… block.

//std::lock_guard<std::mutex> lock(extrapolMutex);
dp.setAzimuth(aziExtrap);
dp.setElevation(eleExtrap);
dp.setDistance(disExtrap);

I also tried using a mutex in the actual delay-setting functions in the DelayProcessor instances, where the DelayLine.setDelay() function gets called. But I get a feeling that I have some big misconcenptions about not-blocking-the-audio-thread.

This sounds actually very interesting, while obviously not trivial. The current CPU Load in Reaper is shown as 25% on my device and the additional overhead might still be okay. Honestly though, this will be one of the later tries, after smoothing has been tested out. As you all have pointed out (thank you btw!), it is the non-existing smoothing of any kind to the delay time parameter, that’s causing this unwanted noise. Playing with the timer interval just makes the clicks more “granular”.

Stuff like the Doppler Effect are very well accepted here, the whole binaural processing is supposed to be “very plausible” in the end. Then when I think about what you say about “interpolating the delay times per sample” I feel like I did not give that much of a thought before, because I automatically thought adding a DelayLineInterpolationType takes care about that as well, which was another big misconception, because it does not handle changes to its temporal settings. I have only seen one other post about a per sample processing of the delay time modulation (Delay Line artifacts) but I do not completely comprehend it. The way it should happen “sample-wise” just confuses me.

This, luckcily, is covered by the code, upon which I am building this whole thing. Rotation matrices and HRTF interpolation included.

Hi mstr,
you can imagine the delay buffer like an analog tape, where changing the delay time is equivalent to adjusting the distance between record and play head. in case of digital, you need to reconstruct the analog waveform from the discrete signal (look up “sampling theorem”). as soon as you have any fractional delay time, you need to know what the waveform looks like between the available samples, thus the need for interpolation. for many practical purposes, efficient approximations like catmull rom are good enough. This is what’s configured with DelayLineInterpolationType.

Interpolating the delay time is not that hard. It simply means that instead of jumping immediately to the new delay time, you set the desired new time as a target value, and each sample that’s being processed, the current delay time value is moved towards that target.

Mind to share which open source library you’re using for the spatial stuff?

Hey there,

You are right about that, and I am currently implementing an approach with a SmoothedValue for the target delay time to see if that works.

My current access abilities to the delay time only allow me to call smoothedValue.getNextValue() once per processBlock()-call, because that’s where i call delayLine.process() and I have no other step in between where I could smoothely increment the delay time. (Calling it only in extrapolate() after parameter changes would not be enough, of course). Given my current buffer size for testing (=1024) this might take a considerable while. Decreasing this buffer size is not possible on my machine right now as it seems.

The more thorough solution seems to be creating a derived implementation of the delayLine.process() method, which is what other users in this forum have talked about as well.

Of course! I am building my project on one of the SPARTA suite plugins (Binauraliser NF)

Also, why would you recommend ParameterAttachment over the timer method? Do you generally think that’s better?

Running a timer may make the message thread busy. As you only needs to update the parameters when there is a move on the pad, why not use a listener callback instead? Besides that, you may want to move the dragger when there is activated automation on parameters.

The .setAzimuth() function is just a normal setter function with no other tricks to it. My original way of not-blocking-the-audio-thread was using a mutex in extrapolate before the dp.set… block.

If it only sets some std::atomic<float> or std::atomic<bool>, that would be fine and you won’t need any locks. Besides that, if you change anything that is used by the audio thread, be careful.

To be honest, since the whole position thing is split up into three adjustable parameters (x,y,z) and there will always be a change to multiple parameters when the listener moves, I thought THAT might crowd the message thread (because there’d always be these three overlapping Listener callbacks). Right now, when catching the changes in the PluginEditor’s own GUI timer callback I do this:

if (fabsf(lisPosXslider->getValue() - hVst->listenerPositionCurrent->x) > 1e-2f) {
    hVst->listenerPositionCurrent->x = lisPosXslider->getValue();
    hVst->updateDelayProcessorParams = true;
}
if (fabsf(lisPosYslider->getValue() - hVst->listenerPositionCurrent->y) > 1e-2f) {
    hVst->listenerPositionCurrent->y = lisPosYslider->getValue();
    hVst->updateDelayProcessorParams = true;
}
if (fabsf(lisPosZslider->getValue() - hVst->listenerPositionCurrent->z) > 1e-2f) {
    hVst->listenerPositionCurrent->z = lisPosZslider->getValue();
    hVst->updateDelayProcessorParams = true;
}

So as soon as one parameter changes (the position of the xy-pad dragger is caught by the Editor and written to a respective slider “lisPos…”) the updateDelayProcessorParams is set, which is the one caught in the PluginProcessor, and that goes on to trigger the extrapolate function which recalculates the delays.
This choice was made by gut-feeling obviously. Does this create an overhead that could be reduced by what you suggested?

You are right, this will also have to be considered!

Yes, but no. These functions set certain floats, not atomic values. Does that make any difference here?
As I have experienced here, the only thing that really causes the trouble is the call of “delayLine.setDelay(x)”, so would it make sense to say that there is a race-condition problem, because “delayLine.setDelay(x)” is being called while possibly “delayLine(process)” is called in parallel by the audio thread?

That sounds too complicated (and it seems that you mess the frontend with the backend, which might be a bad idea). In general, the plugin looks like this (unless you have some heavy objects to share between the editor and the processor):

  • editor ↔ ParameterAttachment ↔ APVTS
  • processor ↔ APVTS::Listener parameterChanged ↔ APVTS

I have used ParameterAttachment to control Freq & gain of filters by a 2D dragger. And it works well. More details at ZLEqualizer/source/gui/dragger2d/dragger_parameter_attach.cpp at main · ZL-Audio/ZLEqualizer · GitHub

Yes. Most juce::dsp functions are not thread-safe (delayLine is one of them). If you set floats on one thread and read them on another thread (e.g. audio thread), there is a data race. The following video might be helpful:

1 Like

I’ll definetly look into that, thanks! To be honest I just implemented every kind of button,slider,etc. in the same way that I found it in the plugin code that I was building upon, so I did not think that through all the way.

Okay wow, I was very much on the wrong track here, good to know! And thanks also for the video. But hey, how can one specify which parts of JUCE are thread-safe and which are not? What is the way to find out?

If anyone has read this far in the thread… The suggestions made by the commenters here made me try out smoothing the delay time for the delay lines per sample using a SmoothedValue. I went on to create a class derived from DelayLine, which does everything the same except for the process method, where I inserted this one line

for (size_t i = 0; i < numSamples; ++i)
{
    // adjust time here
    setDelay(smoothedDelayInSamples.getNextValue());
    pushSample((int)channel, inputSamples[i]);
    outputSamples[i] = popSample((int)channel);
}

smoothedDelayInSamples is the SmoothedValue, whose target value always gets set when I calculate a new delay value via an updated listener distance (see extrapolate() function above). The SmoothedValue here was instantiated by calling reset() with a ramp length of 0.01 seconds. I chose that value, because I saw someone else do it that way, I am unsure what to pick here… Does anyone know a good way to find the best ramp length?

The resulting audio output got better, but still sounds bad. Fast modulations of the delay time can still cause crackling (not surprised). I have two different cases of weird sounding output though. Every delay line in my project is preceeded by a convolver, but

  • Most of the delay lines work with one channel only, and have a short IR (~100samples)
    modulating the delay time here right now gives a “wobbly” kind of sound. A bit like a cassette deck with a worn out tape

  • One DelayProcessor though is working in stereo and has a bigger stereo IR (~50000samples)
    Modulating the delay time here leads to crackling more easily (probably because it’s just more computational expensive)

Right now I am unsure about whether my implementation with introduced SmoothedValue is faulty/suboptimal or whether the way SmoothedValue smoothes the values is just not good enough for this use case