Optimizing DSP codes (beginner)

erteash · November 6, 2016, 2:36pm

Hi
I don’t have any experience with this, how can we use SSE features to improve the performance by processing both channels separately? FloatVectorOperations?
I couldn’t find any examples of using FloatVectorOperations in action. How do you do it for instance in the following example?

//process block
for (int channel = 0; channel < 2; ++channel)
{
float* channelData = buffer.getWritePointer (channel);
for(int i=0; i<buffer.getNumSamples(); i++)
{
//processing…
channelData[i] = GainParam*channelData[i];
float temp = abs (channelData[i]);
…
}
}

cocell · November 6, 2016, 5:11pm

Maybe this might help.
This splits both channels and then you can do your processing in the while statement. Im doing this from my phone so it could be a crazy format.

const int totalNumChannels = getTotalNumInputChannels();

if(totalNumChannels == 2){

float* leftSamples = buffer.getWriterPointer(0);

float* rightSamples = buffer.getWriterPointer(1);

int numSamples = buffer.getNumSamples();

while(numSamples > 0)
{
float sampleValuesLeft= *leftSamples;

float sampleValuesRight =*rightSamples
numSamples–;
}

}

jimc · November 6, 2016, 6:39pm

I seem to remember that to use the SSE features really effectively it helps to have your channel data in an interleaved format.

You can process a single channel effectively with FloatVectorOperations but not if you have time-varying parameters you’d like to apply smoothly. And unfortunately time-varying parameters are usually wanted.

Anyway - for what you have there you can apply the gain with:
FloatVectorOperations::multiply(buffer.getChannel(channel), GainParam, numSamples);

But as soon as you try to smooth the gain changes you’ll see what the problem is.

Also, if the next operation is with temp = abs(channelData[i]) then you’ll have a performance hit here I expect as you’ll need to reload the channelData[i] from the cache, whereas in your existing loop it’ll already be in a processor register from the previous line of code.

erteash · November 7, 2016, 11:22am

I don’t get it, Why cant we do something like:

StereoData = Gain*BothChannels[i]; //Up to 4 channels with SSE

And JUCE would take care of the rest. Which is processing each channel with SSE simultaneously. This is how its done in FlowStone.

erteash · December 13, 2016, 8:26pm

I have a simple compressor code, FL Studio shows 4%, i have the same code build with flowstone and its only 2%. I didn’t expect that. I created this topic to get some ideas how you’re doing dsp stuff to keep them cpu friendly.
In my case i have an envelope follower and few more lines which are being processed for each sample in the buffer.
Waiting for some insights.

jimc · December 13, 2016, 8:31pm

I assume you’re in release mode with the optimisations on?

Have you tried putting the profiler on your code?

erteash · December 15, 2016, 7:15pm

Thanks for the reply. Yes I’m testing the release build, CPU usage is much lower than debug build. I haven’t used the profiler, I’m guessing that helps with finding the CPU hungry parts? let me see how it works before asking stupid questions.

jimc · December 16, 2016, 11:43am

Definitely get into the profiler … takes a tiny bit of getting used to but is the number one tool for figuring out performance stuff!

Topic		Replies	Views
SSE optimization General JUCE discussion	4	993	February 5, 2017
FloatVectorOperations and Optimizing Time-Dependent Processes Getting Started	1	392	October 13, 2023
Aren't the built-in AudioBuffer operations (such as applyGain(...)) a little verbose? Development	13	1739	May 16, 2019
Understanding SIMDRegister usage General JUCE discussion	4	1823	March 17, 2018
Simple SSE wrapper Useful Tools and Components	8	2298	June 18, 2019

Optimizing DSP codes (beginner)

Purchase

Discover

Learn

Support

About

Events

Optimizing DSP codes (beginner)

Related topics

Purchase

Discover

Learn

Support

About

Events