How Do I Properly Scale A Delay Line To 3000 Taps?

#include "DelayLine.h"

DelayLine::DelayLine(int M, float g)
    : M(static_cast<size_t>(M)), g(g), writeIndex(0)
{
    // Find next power of two >= M+1 and compute mask
    size_t bufferSize = 1;
    while (bufferSize < M + 1)
        bufferSize <<= 1;

    mask = bufferSize - 1;

    buffer.resize(bufferSize, 0.0f);
}

float DelayLine::read() const
{
    size_t readIndex = (writeIndex - M) & mask;
    return buffer[readIndex];
}

float DelayLine::read(int tau) const
{
    size_t readIndex = (writeIndex - tau) & mask;
    return buffer[readIndex];
}

void DelayLine::write(float input)
{
    size_t readIndex = (writeIndex - M) & mask;
    float delayedSample = buffer[readIndex];

    buffer[writeIndex] = input + g * delayedSample;
    writeIndex = (writeIndex + 1) & mask;
}

void DelayLine::process(float* block, int blockSize)
{
    for (int i = 0; i < blockSize; ++i)
    {
        float x = block[i];
        float y = read();
        write(x);
        block[i] = y;
    }
}

    for (int ch = 0; ch < buffer.getNumChannels(); ++ch) {
        float* channelData = buffer.getWritePointer(ch);
        for (int sample = 0; sample < buffer.getNumSamples(); ++sample) {
            float x = channelData[sample];
            float y = 0;
            for (int m = 0; m < k.size(); ++m) {
                y += z[ch]->read(k[m]);
            }
            z[ch]->write(x);
            channelData[sample] = y * (1.0f / 600);
        }
    }

This code implements a multi-tap delay effect applied to a multi-channel audio buffer. For each channel, it reads delayed samples from the delay line at multiple offsets specified in the vector k before writing the current input sample. The problem is that this code does not scale efficiently. For example, when the number of delay taps (k) grows very large (e.g., around 3000), significant audio glitches and performance issues occur. How can I fix this?

If you are having dropouts (pops and gaps without audio), you’re cpu may not be able to keep up with the required amount of calculations, which is currently

numChannels * numSamplesPerBuffer * k

where in a stereo setup with a buffer size of even 1024 at 3000 taps you’re doing

6144000 calls of the read() method

in a time sensitive processBlock method.

You need to re-evaluate how you do this, right now i honestly would love to see the read() implementation to better understand what’s happening.

edit: i forgot to scroll up

heres some super rough implementation idea, basically just do things in groups. Thats what i think might work, no promises xd.

SIMD could also be something worth looking into. afaik its good for stuff like this, but i barely understand how to use it.

for (int ch = 0; ch < buffer.getNumChannels(); ++ch) {
    float* channelData = buffer.getWritePointer(ch);
    
    // dont use DelayLine pointers, instead use delaylines directly
    // stops a BUNCH of dereferencing
    DelayLine& dl = z[ch];

    for (int sample = 0; sample < buffer.getNumSamples(); ++sample) {
        float x = channelData[sample];
        // read a bunch of taps at once instead of doing one every loop...
        float y = dl.readTaps(k);
        dl.write(x);
        channelData[sample] = y * (1.0f / 600); // Normalizing or smth i think
    }
}

Since you are dealing with integer delay, instead of reading sample by sample, you may copy(or SIMD add) a relatively long consecutive buffer.

1 Like