Multiple oscillators optimisation (CPU)

I'm experimenting with three wavetable oscillators by using three instances of the Synthesiser class. Even without playing any notes the CPU is around 15% and goes to 20% when I press a few notes. That's fine but still a bit high compared to other wavetable synthesisers.

Anyway the real problem comes when I enable an LFO which ticks on every sample for all three oscillators. The CPU goes to 50-60% even without playing. I tried to limit the numbers of operations inside both the LFO and renderNextBlock but the CPU is still quite high.

I managed to reduce almost in half the CPU usage by using only one Synthesiser class and doing all the waveforms sums in one place, that helped but I'm not sure it's the way to go.

What's the best way of handling multiple oscillators and have less CPU usage especially with modulations?


What's the reason for using multiple synthesiser objects..? There'll be a fixed level of overhead in each one for parsing the midi and messing around with buffers, etc., so wouldn't it make more sense to use one synth and multiple voices?

But as with all performance issues, the big question is: what does your profiler show as the hot-spots?

I followed this advice:

But I guess it's not the way to go. The problem was inside renderNextBlock in the oscillator voice where I had essentially this:

- get sample from wavetable
- tune wavetable if fineTune changes
- tune wavetalbe if vibrato is on (the lfo ticks every sample) <- that was almost doubling the CPU
- apply velocity, envelope, gain

Now I changed it to have only one Synthesiser object, and inside a voice I sum multiple waveforms. So basically 1 voice plays my 3 waves/oscillators. It's much better but still more than other synth with similar settings. The vibrato still rise the CPU by 5-10%

I'm not sure what should I do to improve further. 

Are we talking debug code or optimized release code here?

Well I am doing it as I adviced, and I can play 64 voices with envelopes, filters and LFOs with about 6% CPU (which is not a accurate number, buts let's just say, it's fast enough.)

As soon as you want different behaviour for the different sounds you need to go down this route.

And the overhead Jules is talking about should be really neglectible for this case (handling the MidiBuffer twice should not  be a problem for your scenario)

And I don't know what you mean by tuning the wavetables. You have a uptimeDelta variable which advances the voiceUptime by the pitchfactor so tuning the wavetable is only adding eg. 1.14 instead of 1.0 every sample (you will need some interpolation then of course), but if you calculate the wavetable again everytime you change the fine tune, you have your problem right there.

This is really basic stuff, so in fact it should not use any CPU at all (I assume you use a computer build after 1996)


It’s release code of course.

Then I think there’s something wrong with the wavetable class I’m using. I think it chooses a different wavetable according to the octave playing to reduce aliasing, but I don’t think it calculates the wavetable every time (I’ll check when I get back).
Can you suggest any other oscillators/wavetables classes stable enough, not aliased and CPU friendly?

It's completely pointless to speculate about what might be your problem without running a profiler and actually measuring where the bottlenecks are!

Having a precalculated look up table for every octave, should be no performance problem.

I don't know if it's off any help, but I have implemented a sine wavetable oscillator (calling sin(x) every sample is slower than using this method).

At startup I fill the array:

for(int i = 0; i < 2048; i++)
    sinTable[i] = sinf(i * float_Pi / 1024.0f);

and in my SynthesiserVoice's renderNextBlock() method I do the following:

while (--numSamples >= 0)
    int index = (int)voiceUptime;
    float v1 = sinTable[index & 2047]; // This is a little hack to prevent modulo calculation
    float v2 = sinTable[(index +1) & 2047];

    // Basic linear interpolation
    const float alpha = float(voiceUptime) - (float)index;
    const float invAlpha = 1.0f - alpha;
    const float currentSample = invAlpha * v1 + alpha * v2;

    // Fill the buffer
    voiceBuffer.setSample (0, startSample, currentSample);
    voiceBuffer.setSample (1, startSample, currentSample);

    // advance the sample counter
    jassert(voicePitchValues[startSample] > 0.0f);
    voiceUptime += (uptimeDelta * voicePitchValues[startSample]);

The voicePitchBuffer are precalculated pitch values for the current audio buffer (it is calculated just before this method is called.

This is working quite well and is the basic concept of my modular design (I can add as many modulators / envelopes / filters I want). I know there is some overhead which I have to introduce to gain the flexibility, but I think this is how most modular systems are done.

But I also strongly recommend using a profiler (VerySleepy does the job). I have squeezed out another 20% - 30% performance gain by doing so.

There are some other general tips I picked up somewhere in the interweb:

  • don't use virtual functions for a per sample function (like tick() or calculateNextSample()). The overhead of carrying the vtables around is sometimes noticable (and sometimes not, but in my case it was).
  • try to avoid calling complex mathematical functions on a per sample basis (like exp, sin, sqrt, pow). Of course sometimes they are needed, but in most cases there is a clever guy who came up with a solution for this by using only multiplication and addition. is a great place to search for this stuff.
  • Try to vectorize as much as possible (the FloatVectorOperations come in handy for this)

I don't know if I am preaching to the choir here, but maybe it is of some help.

1 Like

Thanks a lot for the replies and help. I'll check your code tomorrow. Just one question, are you filling that pitch buffer with both modulation and actual pitch values? Does it really matter if you calculate them before or in the loop?

Ok so I tried the profiler and what seems to be "high" is the getOutput function (around 16%) which calculates the wavetable to play according to the frequency, then interpolation and then returns the sample. I guess while it sounds quite good, it's not really CPU friendly. In fact if I switch the oscillator to a "simple" noise function it is much much faster.

Thanks for all the tips, I'll try to work on the code tomorrow and see what I can do.

The pitch values are modulation values in the range from 0.5 to 2.0 (+- one octave). You could calculate them in place, but since I have different types of modulators it is much faster to calculate them blockwise (eg. A velocity modulator has a fixed amount that is calculated at voice start, so I can use FloatVectorOperations::multiply to fill the whole buffer instead of adding another single data multiplication to the inner loop)

the pitch (the actual note frequency) is calculated at voice start and saved as uptimeDelta.

This would also be the place to select the wavetable with the right octave for antialiasing (use a pointer as member variable which is assigned to the wavetable data.

I really think you are doing something weird with the wavetable calculation. As I said, you should not calculate them at all in the render process (the only calculation should be interpolating and multiplying the gain.)


1 Like

I'm having problems tuning your sine wave. I assume that on start note I should have something like: uptimeDelta = cyclesPerSample * 2.0 * double_Pi; right?

Anyway it seems that after tweaking it a bit I managed to get similar CPU usage as massive with same settings, still not close to what you said "I can play 64 voices with envelopes, filters and LFOs with about 6% CPU" I'm having an average of 10% CPU if I play 5 notes with envlope and LFO on pitch.

I'll run the profiler again and see if there's something else I can do, but anyway much better now.

Well, since the wavetables are 2048 samples long, you will have to take this in account when calculating the uptimeDelta:

const double cyclesPerSecond = MidiMessage::getMidiNoteInHertz (midiNoteNumber);
const double cyclesPerSample = cyclesPerSecond / getSampleRate();
uptimeDelta = cyclesPerSample * 2048.0;

(it is the same formula, but instead of 2 * pi, which is the length of one sine cycle you simply take the wavetable length)

Well, Massive is a real CPU-hog (at least my old version, I don't know if they improved it), so it should not be your role model regarding performance.

And as I said, statements like "6% on my CPU" are not comparable, but I wrote something until it worked, then I profiled it and optimized it until it met my requirements (I could go on with optimization, but it would become ugly pretty quickly)

Wonderful thanks. I managed to make other tables like that and seems much better.

I would love to read more tips on optimization although I know that at this stage is better to build something that work instead to worry too much on CPU usage.

Reviving this thread, I’ve hit a problem in my wavetable synth when implementing unison. Basically I have 8 voices, and each one will have 8 voice unison for detune spread, so totaling 64 voices * N oscillators.

Thanks to profiler I optimized everything I could, but 65% of the total usage comes from the array access to get the sample, and since I’ve implemented it in plain arrays I don’t think I can go simpler than that which is strange since arrays access should be dirt cheap (even if not sequential access due to phase).

I’ve swaped loops, and declared things to improve memory locality which gave me an extra 25%, but it stills consumes a lot (like 60% of my total CPU with 144 voices). How have you guys efficiently aproached it? I have a single synthesizer and deal with unison there (as extra oscillators), only rendering the ones that should be active. I’ve also tried having just one oscillator rendering the output as many times as unison voices for each sample but didn’t make much difference performance-wise.