How Optimize the Performance of a Synth and Sampler - Threading, SMID, bad coding?

Hey, thanks in Advance, i am learning JUCE and C++ for a couple months now and as a learning project i am coding a Synth/Sampler Plugin. It works quit well, and i am happy with Sound and Functionality. My Problem is, that it uses 35% of my CPU (M2 Mac) while playing 6 Notes.

The Base is a Sampler, with a combfilter on every voice, to make notes out of ambience samples. To Support this there are two wavetable oscilators, then filters, a Waveshaper and a convolution Re-Amping section. But the Sampler alone has a CPU Usage of 14% (in my DAW), and the Wavetable alone around 20%. Changing the Hermite Interpolation to a Linear Interpolation helped, but its still a lot.

Now i thought maybe having a own thread for every voice could help but i am unsure if it is a recomended thing to do. Also i didn’t quit get the SMID tutorial, probably because i am not so used to C++ i think. So I don’t know if thats usefull in my Case, and how to properly implement it. If you have an Idea, what i could try, or shouldn’t, or know good ressources to learn about all that, i would be really thankfull.

I hope the category is right, and that posting a lot of code is okay, if thats not wanted or recomended, please tell me, i don’t want to flood the forum with garbage^^

Again thanks a lot in advance:)

Here is the Code of my SynthVoice:

if (adsr.isActive()) {
        for(int i = startSample; i < numSamples + startSample; i++){
            float nextAdsrSample = adsr.getNextSample();
            
            monoBuffer.setSample(0, i, wavetable.renderNextSample(sliderPos) * nextAdsrSample);
            
        }
        
        //to only hand over a full buffer
        if (startSample > 0 && numSamples == outputBuffer.getNumSamples() - startSample ){
            
            outputBuffer.addFrom(0, 0, monoBuffer, 0, 0, outputBuffer.getNumSamples());
            outputBuffer.addFrom(1, 0, monoBuffer, 0, 0, outputBuffer.getNumSamples());
            monoBuffer.clear();
        }
        
        //normal case:
        else if (numSamples == samplesInBuffer){
            outputBuffer.addFrom(0, startSample, monoBuffer, 0, startSample, numSamples);
            outputBuffer.addFrom(1, startSample, monoBuffer, 0, startSample, numSamples);
            monoBuffer.clear();
        }
       
    }
    else {
        monoBuffer.clear();
        clearCurrentNote();
    }

Here is the Code of my Wavetable Oscilator render function (removing the changeWaveform() unfortunaly didn’t help):

    int integerPhase = int(phase);
    int integerPhaseA = integerPhase - 1;
    int integerPhaseB = integerPhase;
    int integerPhaseC = integerPhaseB + 1;
    int integerPhaseD = integerPhaseC + 1;
    
    if (integerPhaseD >= wtSize){
        integerPhaseD -= wtSize;
        if (integerPhaseC >= wtSize) {
            integerPhaseC -= wtSize;
        }
    }
    
    float wavePos = changeWaveformPosition(sliderPos);
    float fraction = phase - float(integerPhase);
    phase = fmod ((phase + increment), wtSize);
    
    float outputSampleTableA = linearInterpolation(outputWavetableA[integerPhaseB], outputWavetableA[integerPhaseC], fraction);
    float outputSampleTableB = linearInterpolation(outputWavetableB[integerPhaseB], outputWavetableB[integerPhaseC], fraction);
//    float outputSampleTableA = interpolation(outputWavetableA[integerPhaseA], outputWavetableA[integerPhaseB], outputWavetableA[integerPhaseC], outputWavetableA[integerPhaseD], fraction);
//    float outputSampleTableB = interpolation(outputWavetableB[integerPhaseA], outputWavetableB[integerPhaseB], outputWavetableB[integerPhaseC], outputWavetableB[integerPhaseD], fraction);
    

    return 0.2 * (outputSampleTableA * (1.0f-wavePos) + outputSampleTableB * wavePos) ;

of my SamplerVoice:

 if (isVoiceActive()){
        safeBuffer.setSize(outputBuffer.getNumChannels(), outputBuffer.getNumSamples());
        juce::AudioBuffer<double> buffer;
        buffer.makeCopyOf(outputBuffer);
        buffer.clear();
        if (isPartBufffer){ //wenn nicht false ist der safebuffer mit chaoswerten befĂĽllt
            safeBuffer.clear();
        }
        
        juce::SynthesiserVoice::renderNextBlock(buffer, startSample, numSamples);

     
        if (startSample == 0 && numSamples < buffer.getNumSamples()){
            isPartBufffer = false;
            safeBuffer.copyFrom(0, 0, buffer, 0, 0, numSamples);
            safeBuffer.copyFrom(1, 0, buffer, 1, 0, numSamples);
        }
        
        else if (startSample > 0 && numSamples < buffer.getNumSamples() - startSample){
            safeBuffer.copyFrom(0, 0, buffer, 0, 0, numSamples);
            safeBuffer.copyFrom(1, 0, buffer, 1, 0, numSamples);
        }
        
        else if (startSample > 0 && numSamples == buffer.getNumSamples() - startSample ){
            safeBuffer.copyFrom(0, startSample, buffer, 0, startSample, numSamples);
            safeBuffer.copyFrom(1, startSample, buffer, 1, startSample, numSamples);
            
            if (Comb.getCombIntensity() > 0.0f){
                if (!onlyOdd){
                    Comb.allHarmonics(&safeBuffer, juce::SamplerVoice::getCurrentlyPlayingNote());
                }
                if (onlyOdd){
                    Comb.oddHarmonics(&safeBuffer, juce::SamplerVoice::getCurrentlyPlayingNote());
                }
            }
            
            adsr.applyEnvelopeToBuffer(safeBuffer, 0, safeBuffer.getNumSamples());
            
         
            outputBuffer.addFrom(0, 0, safeBuffer, 0, 0, outputBuffer.getNumSamples());
            outputBuffer.addFrom(1, 0, safeBuffer, 1, 0, outputBuffer.getNumSamples());
            safeBuffer.clear();
            isPartBufffer = true;
        }
        else{
            
           
            if (Comb.getCombIntensity() > 0.0f){
                if (!onlyOdd){
                    Comb.allHarmonics(&buffer, juce::SamplerVoice::getCurrentlyPlayingNote());
                }
                if (onlyOdd){
                    Comb.oddHarmonics(&buffer, juce::SamplerVoice::getCurrentlyPlayingNote());
                }
            }
            
            adsr.applyEnvelopeToBuffer(buffer, 0, buffer.getNumSamples());

            
            if(!adsr.isActive()){
                clearCurrentNote();
                Comb.reset();
            }

            outputBuffer.addFrom(0, 0, buffer, 0, 0, outputBuffer.getNumSamples());
            outputBuffer.addFrom(1, 0, buffer, 1, 0, outputBuffer.getNumSamples());
        }
        
    }

And finaly my Combfilter:

 frequency = juce::MidiMessage::getMidiNoteInHertz(midiNoteNumber);
    sampleDelay = hostSampleRate / frequency;


    
    for (int sample = 0; sample < buffer->getNumSamples(); sample++) {
        if (midiNoteNumber < 0){
            break;
        }
        double xL = buffer->getSample(0, sample);
        double xR = buffer->getSample(1, sample);

        
        double yL = delayLineL.read(sampleDelay);
        double yR = delayLineR.read(sampleDelay);

        buffer->setSample(0, sample, xL + combIntensity * yL);
        buffer->setSample(1, sample, xR + combIntensity * yR);

        
        delayLineL.write(xL + combIntensity * yL);
        delayLineR.write(xR + combIntensity * yR);

        
    }

Usually this will make things worse, because all the voices’ threads will have to be synchronized in order to get their output, meaning that the audio thread may end up waiting on 6+ lower priority threads during every single processBlock.

1 Like

You are allocating and deallocating heap memory in that code in the sampler voice, which shouldn’t be done in the audio thread. setSize can possibly be used real time safely with the additional parameters set correctly, but makeCopyOf has to heap allocate to be able to make the copy of the buffer. The deallocation happens when the local “buffer” goes out of scope in the current function. These may or may not be the primary cause of your CPU usage problems, difficult to say by just looking at the other code.

Good to know, thank you:)

Oh ok Yes that could be the cause, I also use this in my processBlock(). Thanks a lot! I will try to replace that tomorrow. Is the operator=() a Audio thread safe alternative? Otherwise I just use a for Loop and setSample()

The operator = basically does the same thing as makeCopyOf, so no, that’s not real time safe either. If you need some kinds of additional work buffers, those should be declared as member variables and allocated to a sufficient size outside the audio thread/processBlock code, typically in prepareToPlay.

Very good to know, again, thanks a lot

Probably worth expanding on this one a little in case anyone is interested.

Using multiple threads and something like a semaphore based job dispatcher to process the voices can indeed give you a decent win IF you have total control over the machine, for example in a Linux embedded environment for a synth where it’s only the synth running and you’ve fine tuned the OS requirements.

For a general DAW plugin it comes with a big YMMV and no guarantees. There’s a number of commercial synth plugins that allow you to toggle between multithreaded and single-threaded modes.

1 Like

i would like to contribute to this discussion that it’s often a good optimisation to not use code structures that go startSample to endSample or something like that because some kinds of calculations are easier when you can always assume 0 as the origin. it’s not a cpu optimisation, but a code structure one. when there are things that want you to go in less-than-numSamples steps, like when it’s about handling midi messages, you can just make views into your buffer that start from the start index,so that all of the processors you write don’t have to deal with this anymore

Changing that really did it, CPU Usage gone down a lot.

Thats good to know, thanks

I am not shure if I understand you correct. What do you mean by make views into the buffer?

As a Sidenote, I struggled a lot today, to find out why my plugin has a higher CPU Usage when i have a higher Buffersize, but apparently Apple Silicon has a lower CPU Usage with lower Buffersizes. In Case anybody has the same proplem of not finding the reason why^^

say you have a chain of effects, maybe a flanger, a modal filter and a granular pitchshifter. very different stuff. now if you had to work with a sub buffer that runs from startIdx 5 to endIdx 12 or so, then all of those processors need to be written to work with startIdx and endIdx, which can be error prone, just because you can make your life easier. you want all those processors to only work with numSamples, which would be 12-5=7 in this case. so you could do something like this:

auto samples = buffer.getArrayOfWritePointers();
std::array<float*, 2> subSamples = { &samples[0][startIdx], &samples[1][startIdx] };

and you have a nice buffer that “views” into your other buffer, but so that it starts at startIdx

Oh yeah that makes sense, thanks for the Info :slight_smile: