Speed up code for upsampled processing in Plugin

copypastecat · April 29, 2021, 12:00pm

Hey there,

I’m trying to implement a plugin using the rt-wdf library by Maximilian Rest to create a analog-sounding distortion (and because I’m a litte intrigued by the method). The processing of samples with the high-level functions provided by the library is pretty straight-foreward and works for the example Fender-Tonestack-circuit (in real time) on my machine, but - as one would expect - introduces aliasing. I’m therefore using the Oversampling-class from the juce::dsp module to oversample the audioBuffer before processing the samples with the wave-digital-filter (also previously described and implemented in C++/JUCE by @maxprod).

Unfortunately, with the included oversampling, the plugin seems to lose it’s real time capability. Even with 2 or 4 times oversampling, the plugin causes the audio-playback in Reaper to stutter, having to stop and process every half a second or so. Also the CPU-load skyrockets to about 75%.

Here is the code from the prepareToPlay and process methods, OVERSAMPLING_FACTOR being a macro defined in the header-file and oversampler defined as:

juce::dsp::Oversampling<float>* oversampler = new juce::dsp::Oversampling<float> (getTotalNumInputChannels(), OVERSAMPLING_FACTOR, juce::dsp::Oversampling<float>::FilterType::filterHalfBandFIREquiripple, true, false);

also in the header-file.

void RtwdfPluginAudioProcessor::prepareToPlay (double sampleRate, int samplesPerBlock)
{
    juce::dsp::ProcessSpec spec;
    spec.sampleRate = sampleRate;
    spec.maximumBlockSize = samplesPerBlock;
    spec.numChannels = getTotalNumInputChannels();
    oversampler->reset ();
    oversampler->initProcessing(spec.maximumBlockSize);


    thisWdfTree = new wdfTonestackTree();
    thisWdfTree->initTree();
    thisWdfTree->setSamplerate(OVERSAMPLING_FACTOR * this->getSampleRate());
    thisWdfTree->adaptTree();
}

void RtwdfPluginAudioProcessor::processBlock (juce::AudioBuffer<float>& buffer, juce::MidiBuffer& midiMessages)
{
    juce::ScopedNoDenormals noDenormals;
    auto totalNumInputChannels  = getTotalNumInputChannels();
    auto totalNumOutputChannels = getTotalNumOutputChannels();

    float bass = apvts.getParameter("BASS")->getValue();
    float mid = apvts.getParameter("MID")->getValue();
    float treble = apvts.getParameter("TREBLE")->getValue();

    thisWdfTree->setParam(0, bass);
    thisWdfTree->setParam(1, mid);
    thisWdfTree->setParam(2, treble);

    for (auto i = totalNumInputChannels; i < totalNumOutputChannels; ++i)
        buffer.clear (i, 0, buffer.getNumSamples());

    auto audioBlock = juce::dsp::AudioBlock<float>(buffer);
    auto context = juce::dsp::ProcessContextReplacing<float>(audioBlock);
    //oversampling:
    auto oversamplingAudioBlock = oversampler->processSamplesUp(context.getInputBlock());

    for (int channel = 0; channel < totalNumInputChannels; ++channel)
    {
        auto* channelPtr = oversamplingAudioBlock.getChannelPointer(channel);
        
        for (int sample = 0; sample < oversamplingAudioBlock.getNumSamples(); sample++)
        {
            thisWdfTree->setInputValue(*(channelPtr+sample)); //access AudioBlock-data via pointer 
            thisWdfTree->cycleWave();
            *(channelPtr+sample) = thisWdfTree->getOutputValue(); //same access as two lines above but in reverse
        }
    }
    oversampler->processSamplesDown(context.getOutputBlock());
}

I’m not sure, what I could do to speed up the whole thing, or even if there isn’t anything wrong with the process that causes the audio to stutter. The switch between audioBlock and single-sample processing could also be introducing errors, maybe? I’m not yet ready to accept that the WDF-method itself is to computationally expensive, since it’s been implemented before, and was said to be real-time compatible.

If anyone could help me by suggesting speed-up-methods or pointing out any mistakes in the code in this regard, I would appreciate it.

Thanks in advance people!

PluginPenguin · April 29, 2021, 12:14pm

Looking at your code, I see that you are not using smart pointers – which you should do for safety reasons – and that you fetching the parameters via their string identifiers from the apvts in every callback instead of storing the pointer to the underlying atomic once. The first one isn’t a performance issue, the second one is unrelated to your oversampling, so take them as side notes

Regarding your main problem: Are you sure that you are testing a release build? Never judge performance by debug builds. And if you see this huge performance impact with a release build, the obvious next step is to profile your code. This will reveal where most time is spent and what to optimise first. I don’t know the library you are using, so I cannot tell anything about the performance that you usually could expect from it, but maybe profiling will reveal hotspots that you can tweak in the library code yourself

copypastecat · April 29, 2021, 1:26pm

@PluginPenguin thanks! Profiling sounds like a good idea. Do you have any recommendations on what software to use for JUCE plugins? I’ve never done any profiling before, so I’m pretty clueless at this point.

Also thanks for the sidenotes, always good to hear some best-pratices etc.!

copypastecat · April 29, 2021, 1:27pm

I’m on Linux (Arch/Manjaro) by the way

andrewj · April 29, 2021, 1:43pm

Good advice from PP there, but you’ll also want to avoid doing memory allocations in processBlock and look for opportunities to vectorise your code (though it looks like the wdf tree implementation only allows per-sample processing unfortunately).

If you haven’t seen it before, then have a good read of this: Ross Bencina » Real-time audio programming 101: time waits for nothing

PluginPenguin · April 29, 2021, 1:45pm

It’s been a while since I profiled on Linux. I used Intel VTune back then but I’m not sure if it works on Arch – I successfully used it on Ubuntu or Centos, don’t remember exactly. And it’s x86_64 only of course, but I guess that you are not working on an ARM machine?

copypastecat · April 29, 2021, 1:57pm

Yes, it’s an ARM machine … also VTune is available from the Arch user repository, so I’ll try to use it.

PluginPenguin · April 29, 2021, 1:59pm

Did they add ARM support to VTune in the meantime?

copypastecat · April 29, 2021, 2:01pm

That’s my understanding too, otherwise I would be happy to use the audioBlock as intended and get around the sample by sample stuff.
Thanks for the further reading! Will look into this

copypastecat · April 29, 2021, 2:03pm

should have proof-read that reply , it ISN’T an ARM machine, it’s x86_64. Sorry for the confusion

PluginPenguin · April 29, 2021, 2:11pm

Ah I see, thanks for clearing that up

karota · April 30, 2021, 8:24am

Hello @copypastecat !

I read about the WDF library by M. Rest and it appears really interesting. However, if I read correctly, it makes use of armadillo, which is a very useful library to manage vector and matrix operations in C++ but it is expensive in terms of performances. (Probably also @PluginPenguin was asking about the performances of the external libraries). It is very useful to make theoretical studies about your algorithms, before the real-time implementations. IIRC, the eigen library is similar and less expensive than armadillo.

Nevertheless, I suggest you to go inside the WDF implementation and try to implement it, without the use of external libraries, directly in C++. IMO, that’s not a problem, in particular if you’re not working with multi-port scattering junctions.

Could I also ask where the nonlinear process takes place? IIANM, the fender tonestack circuit is a kind of linear filtering process

copypastecat · April 30, 2021, 1:11pm

Hi @karota

you’re right, the Tonestack is linear, therefore the wdf implementation should also be linear (I don’t see any reason why the wdfs shouldn’t preserve linearity, although I’m not exactly an expert…), so the distortion I’m experiencing is indeed maybe already a performance issue, even wihout the oversampling. But since the goal is to use it for distortion with the tube-circuits, I included the oversampling-stage right from the start without thinking about that and just kind of jumped to conclusion when I heard the distortion (with no oversampling) in Reaper.

I will try to omit the use of armadillo, which at least for the Tonestack shouldn’t be a problem and see if it helps, thanks!

karota · April 30, 2021, 2:22pm

Ehi @copypastecat !

Yeah, that’s nice to hear someone that studies WDFs.
Before going towards multi-port nonlinearities (such as Tubes, BJTs…), I suggest you to have a clear knowledge of DFLs, adaptation, series, parallel and multi-port junctions. This concepts have been well explained in the literature (papers by Werner, D’Angelo, Bernardini for instance).
After that, try to firstly implement systems with one-port nonlinear elements, such as diodes, since you could not need iterative solvers in this case and you can also apply some Anti-Aliasing technique to reduce computational cost (papers by Parker and Albertini explain the Antiderivative Anti-Aliasing mechanism. Here you can find some implementations: GitHub - jatinchowdhury18/ADAA: Experiments with Antiderivative Antialiasing ).
At this point I think that you can deal with systems with multi and/or multi-port nonlinear elements, that in most cases require high computational costs.

Hope all is clear!

copypastecat · May 3, 2021, 4:03pm

Alright, quick update if anyone is interested or reads this sometime later with a similar problem:

@PluginPenguin was right, because of Reaper scanning an old folder, I was indeed testing a debug build . Switching to the release build solved the problem with CPU-load and oversampling. Now the Plugin runs smoothly, even at 16x oversampling. Out of interest, I changed the libraries LAG-engine to Eigen/Dense (which as @karota already pointed out is supposed to be a lot faster, see also this blogpost). I listed the resulting CPU-loads in Reaper for both Armadillo/Eigen below. As it almost doesn’t differ for both libraries and seems managable in general, I assume that the linear algebra operations aren’t a huge factor in the performance all together.
HOWEVER: the strange distortion is still there. It seems to be especially strong in the very low frequencies. Also, the distortion is a lot heavier on the right channel than the left. Taking a look at the output of the plugin for a sine sweep confirms that. I took the time to quickly measure the plugins frequency response (using Reapers white-noise generator, and octaves FFT/psd capabilities) and compare it to the FR ltspice produces for the real circuit. It seems that the plugin has a pretty strong resonance at the maximum around 50 Hz. Some plots can be found below (sorry for the poor axis-scaling / labeling, but you get the gist…).

If anyone has any ideas on what might cause that low-end-distortion, please let me know. The difference between right and left channel leads me to believe, that it’s maybe leftover chunks of data somewhere inside the memory that the wdf-library uses to compute the output, that distort the signal. I’ll look in to that, and maybe post here again, if I find out what causes the distortion or if I’m able to fix it.

Performance of the Plugin with the two LAG-Libraries:

times oversampling	CPU-load Armadillo	CPU-load Eigen/Dense
1	1.3%	1.35%
2	2.75%	2.5%
4	5.2%	5.1%
8	10%	9.5%
16	11.8%	11.5%

WDF-implementations output when inputting a pure sine-sweep (top: left channel, bottom: right channel):

WDF-implementations frequency-response (all dials 100%):

Physical circuits frequency response (all dials 100%):

tonestack-ltspice-fr-cens.pdf (75.3 KB)

karota · May 3, 2021, 5:05pm

Hi @copypastecat,

first of all, you need to allocate memory of each filter’s state for each processing channel! As far as WDF tree is concerned, I suppose that you need to allocate a WDF tree for each channel!

Read here: #1 most common programming mistake that we see on the forum

copypastecat · May 4, 2021, 12:23pm

Jup, you’re right. Using a separate wdf-tree per channel solved the problem. No more distortion of any kind. Thanks!

Topic		Replies	Views
How to optimise General JUCE discussion	13	1807	August 3, 2017
Plugin consumes loads of CPU after adding DSP modules General JUCE discussion	7	1223	April 18, 2019
dsp::Oversampling class performance Audio Plugins	11	1974	June 12, 2020
Identical code but different output in WDL and JUCE Audio Plugins	16	1771	April 19, 2018
Why is my synth over 10x less efficient as a VST3 vs. standalone? Audio Plugins	21	3448	August 18, 2020

Speed up code for upsampled processing in Plugin

Performance of the Plugin with the two LAG-Libraries:

WDF-implementations output when inputting a pure sine-sweep (top: left channel, bottom: right channel):

WDF-implementations frequency-response (all dials 100%):

Physical circuits frequency response (all dials 100%):

Purchase

Discover

Learn

Support

About

Events

Speed up code for upsampled processing in Plugin

Performance of the Plugin with the two LAG-Libraries:

WDF-implementations output when inputting a pure sine-sweep (top: left channel, bottom: right channel):

WDF-implementations frequency-response (all dials 100%):

Physical circuits frequency response (all dials 100%):

Related topics

Purchase

Discover

Learn

Support

About

Events