Optimizing DSP codes (beginner)

Hi
I don’t have any experience with this, how can we use SSE features to improve the performance by processing both channels separately? FloatVectorOperations?
I couldn’t find any examples of using FloatVectorOperations in action. How do you do it for instance in the following example?

//process block
for (int channel = 0; channel < 2; ++channel)
{
float* channelData = buffer.getWritePointer (channel);
for(int i=0; i<buffer.getNumSamples(); i++)
{
//processing…
channelData[i] = GainParam*channelData[i];
float temp = abs (channelData[i]);

}
}

Maybe this might help.
This splits both channels and then you can do your processing in the while statement. Im doing this from my phone so it could be a crazy format.

const int totalNumChannels = getTotalNumInputChannels();

if(totalNumChannels == 2){

float* leftSamples = buffer.getWriterPointer(0);

float* rightSamples = buffer.getWriterPointer(1);

int numSamples = buffer.getNumSamples();

while(numSamples > 0)
{
float sampleValuesLeft= *leftSamples;

float sampleValuesRight =*rightSamples
numSamples–;
}

}

I seem to remember that to use the SSE features really effectively it helps to have your channel data in an interleaved format.

You can process a single channel effectively with FloatVectorOperations but not if you have time-varying parameters you’d like to apply smoothly. And unfortunately time-varying parameters are usually wanted.

Anyway - for what you have there you can apply the gain with:
FloatVectorOperations::multiply(buffer.getChannel(channel), GainParam, numSamples);

But as soon as you try to smooth the gain changes you’ll see what the problem is.

Also, if the next operation is with temp = abs(channelData[i]) then you’ll have a performance hit here I expect as you’ll need to reload the channelData[i] from the cache, whereas in your existing loop it’ll already be in a processor register from the previous line of code.

I don’t get it, Why cant we do something like:

StereoData = Gain*BothChannels[i]; //Up to 4 channels with SSE

And JUCE would take care of the rest. Which is processing each channel with SSE simultaneously. This is how its done in FlowStone.

I have a simple compressor code, FL Studio shows 4%, i have the same code build with flowstone and its only 2%. I didn’t expect that. I created this topic to get some ideas how you’re doing dsp stuff to keep them cpu friendly.
In my case i have an envelope follower and few more lines which are being processed for each sample in the buffer.
Waiting for some insights. :slight_smile:

I assume you’re in release mode with the optimisations on?

Have you tried putting the profiler on your code?

Thanks for the reply. Yes I’m testing the release build, CPU usage is much lower than debug build. I haven’t used the profiler, I’m guessing that helps with finding the CPU hungry parts? let me see how it works before asking stupid questions.

Definitely get into the profiler … takes a tiny bit of getting used to but is the number one tool for figuring out performance stuff!

1 Like