Class structure for Audio Plugins -- parameter updating & audio processing for minimal cost

Hi Folks,

Fair warning: this post is quite long. The questions I posed at the end might make sense to some people without having to read all my implementation details, so you can try skipping to those if you’d like. I apologize in advance for wasting anyone’s time.

I have a few general questions to discuss about best practices for structuring my audio plugins.

I’ve completed a few projects in JUCE, but none that are optimized to a level that is release-ready IMO.

There are two questions that have always plagued me through these designs that I have not been able to find a good answer to:

  1. What is the best way to pass down parameter update calls from the DAW?
  2. What is the most efficient way to process audio by different modules within a class?

My gut feeling says they are related, and I think my inability to find the optimal solution is based on my lack of knowledge of processing threads.

I’ll give a simple example, and tell you how I currently handle the implementation.

I’m making a basic Chorus effect. Right now, its stereo only, and it uses two delay lines for each channel -> 4 total delay lines. Also, each delay line has an LFO that modulates the delay time. Since the L & R channels require a phase offset, there are 4 LFO modules, one for each delay line. Eventually, there will be a filter, but we can ignore that for now.

So my current implementation looks something like this:

class DelayLine {
public:
    float process(float inputSample, float delayLengthInSamples);
    float feedback;

private:
    float* buffer;
    float writeIndex;
};

class Lfo {
public:
    float getNextSample();
    void setFrequency(float f);

private:
    const float* currentWaveTable;
    float readIndex;
    float increment;
};

class Chorus {
public:
    void process(AudioBuffer<float>& blockToProcess);
    void setDelayLength(int delayIndex, float delayLength);
    void setLfoFrequency(int lfoIndex, float frequency);
    void setLfoAmount(int lfoIndex, float lfoAmount);
    void setFeedback(float feedback);

    float dryWetMix;

private:
    float sampleRate;
    Array<float> currentDelayLengths;
    Array<float> lfoAmounts;
    OwnedArray<Lfo> lfoArray;
    OwnedArray<DelayLine> delayArray;
};

Then, in my class that inherits from AudioProcessor, I have a Chorus object and an AudioProcessorValueTreeState for all the parameters that the DAW can interact with.

In the ‘processBlock’ function of AudioProcessor class, I just call the ‘process’ function from the Chorus class and pass it the AudioBuffer from processBlock. Then, in the Chorus ‘process’ function, there is a loop that goes through each sample, gets the LFO sample, and then gives the sample to the DelayLine to get the delayed output.

Finally, I handle parameter updating by having my class that inherits from AudioProcessor also inherit AudioProcessorValueTreeState::Listener, then implementing the parameterChanged() function with a switch statement to call the appropriate set function in my Chorus class, which in turn calls the appropriate set functions in the Lfo and DelayLine classes, like this:

void ChorusAudioProcessor::parameterChanged(const String& parameterID, float newValue)
{
    // tree is a reference to the AudioProcessorValueTreeState
    int parameterIdx = tree.getParameter(parameterID)->getParameterIndex();
    
    switch (parameterIdx) {
        // LFO #1 frequency
        case 0:
            chorus->setLfoFrequency(0, newValue);
            break;

        // LFO #2 frequency
        case 1:
            chorus->setLfoFrequency(1, newValue);
            break;

        // DelayLine #1 length
        case 2:
            chorus->setDelayLength(0, newValue);
            break;

    ... etc ...
    }
}

I’m interested in a few specific situations relating to this implementation, and to my two more broad questions I posed at the start:

  1. If the DAW wants to update a parameter (I’ll use feedback as an example) while the Chorus object is in the middle of its sample-by-sample processing loop, does this present any issues? In my novice programming brain, the functionality would look like: Processing audio in loop -> setFeedback called -> Pause audio processing loop -> update feedback -> resume audio processing

  2. If I continue to construct new components from these subcomponents, it would seem that I would start to get long chains of set functions. Will this cause issues with performance, and if so, how might I change my program structure to reduce these chains of function calls?

  3. Excessive function calling also seems to occur in my Lfo and DelayLine classes, due to them processing single samples. Would a better method be to have the Lfo class fill a buffer with output samples, then pass this buffer along with the input audio to the DelayLine class, which fills the input buffer with delayed samples?

  4. Related to (3), if processing blocks is indeed more efficient, how might this affect signals from the DAW to change parameters in the middle of processing? Using the feedback example above and my current understanding of this implementation, if the DAW wants to change the feedback in the middle of processing a block, the sample-by-sample loop will get paused, the feedback will be updated, and then processing will resume. If I switch to block processing, then it seems to me that the DelayLine would rarely update its feedback in sync with the signals from the DAW, and instead the feedback will more likely be updated between blocks. For some parameters this is likely okay, but for others such as the frequency of a filter, it would seem that using a block size of 512 or more could produce audible artifacts by being unable to adjust the frequency during the processing of a particular block.

I don’t necessarily need answers to all of these questions. Rather, I just put them there as an outline of my thoughts and the logical issues that seem to stem from them. If any of you have any insight regarding any of these topics, whether it answers my questions directly or not, that would be very much appreciated. Also, if anyone knows resources that can help me maximize the efficiency of my plugins, specifically regarding parameter updating and block processing, then I would love to take a look at those.

As I said before, I really have not been able to find much on these topics, which is why I decided to outline my current understanding, so that a kind soul can tell me where it is wrong and where I might go to improve it.

Generally yes. Stuff gets to stay in the registers, L1 cache and instruction cache if you work in small loops over a block.

This isn’t how it happens. I think you can assume that you’ll only get set parameter calls before your block start. It’s probably possible that some host might try to send you one mid-block… but not with the intention of you updating your DSP in the middle of processing!

Thank you, that actually does a lot to clear up my confusion.

So, thinking about the filter example I gave in (4), if you wanted your filter frequency to update every 128 samples but the block size varies from 128 - 2048, would a possible solution be:

  1. Filter class has a pointer to raw frequency parameter value from AudioProcessorValueTreeState
  2. Split the input buffer into 128 sample blocks in Filter process function
  3. Process each of these blocks, updating the filter coefficients after each one using the current value of the frequency pointer

Almost.

You won’t (probably) get a new value from the host for frequency for the sub-blocks.

But you may need to be smoothing from previous value to the current one to avoid zipper noise.

Pseudo code:

for (int i = 0; i < numSamples; ++i)
{

  if ((sampleCounter++ % 128) == 0)
     filter.setCoeffs(filterSmoothed.getNextValue());
  
  filter(audio[i]);
}

You have to be prepared to handle any block size up to the maximum block size that you received in the prepareToPlay() function. You could even get 0 samples in a process call, in cases such as where the host is merely updating parameters.

I think this is pretty mean spirited of the host.

2 Likes

@HowardAntares Right. I used a simplified example to explain my logic, but that is an important clarification.

@jimc It’s all coming together in my head now. Thanks for being patient with me.

Does anyone have any ideas about question (2) from my original post:

If I continue to construct new components from these subcomponents, it would seem that I would start to get long chains of set functions. Will this cause issues with performance, and if so, how might I change my program structure to reduce these chains of function calls?

The function call overhead is pretty low. Even virtual function calls are pretty fast. If you are just getting started with all this structure it how you think is easiest to follow and worry about this later.

You can learn to read the assembly, you’ll find the compiler (in a release build) optimises some simple calls away to nothing.

1 Like

Stupidly simple example: the compiled func2 doesn’t call func at all.

I can tell you my limited experience. I process sample-by-sample, but with manual vectorization, and I don’t use the dsp classes. (Actually I have the whole “model” isolated from Juce, and it’s still smaller than the ui code.) In my case the routing is a bit complicated and it’s not very obvious where to pass whole buffers, but when I tried it, the memory overhead bogged everything down. So I went back to sample-by-sample, and I try to keep minimal state -that is, anything that changes from sample to sample and isn’t used later should not be stored and just pass along from arguments to locals to returns, so that most of it stays in registers, and the smaller state is more likely to be kept in cache. I have many derived parameters, and I compute them only when they change, so the actual dsp only uses most-derived parameters, to minimize realtime computation. I’m not sure about automation changing parameters during processBlock, but there’s the ui too, and I need derived parameters to be computed at the start of each 4 sample round, because they’re correlated. So I keep a MPSC message queue for parameter changes from APVTS::Listener to the model, that is checked in processBlock every 4 samples. I also have a SPSC queue for a few things that go the other way around, and a big buffer for time domain visualization, that stores the minimum needed to have a sample for each pixel at the max zoom. The overhead of calls is tricky. Sometimes inlining makes things better, sometimes worse. I don’t think you should base your design around that. Make the functions that make sense -you can profile later and see what needs inlining. I have the whole simd wrappers forceinlined -apart from that, only once I’ve had a function need it. I also use vectorcall (in Windows) for moving around the simd wrappers. I have a lot of log/exp in the dsp which, apart from being a hog by themselves, are not vectorizable in their std implementations. So I use my own approximations, to around 16 bits of precision. This may seem too bad, but has far less impact than the refinings I can add with the freed overhead. I smooth some parameters (basically all gains, and just linearly) that I consider essential. Others are somewhat smoothed by the processing itself, and the rest should be automated with care, so to say. I recently discovered that amalgamating all my code and compiling with Clang is quite a boost from MSVC, amalgamated or not. Low CPU usage was a design target, and I’m quite happy with it. Still, I’ve only done one large-ish project, so things may vary. There’s always tradeoffs, you can’t do everything for nothing. I’d say the biggest “optimizations” came from rethinking the algorithms themselves, minimizing redundancies and discerning the essential from the negligible.

1 Like