Optimizing DSP codes (beginner)


#1

Hi
I don’t have any experience with this, how can we use SSE features to improve the performance by processing both channels separately? FloatVectorOperations?
I couldn’t find any examples of using FloatVectorOperations in action. How do you do it for instance in the following example?

//process block
for (int channel = 0; channel < 2; ++channel)
{
float* channelData = buffer.getWritePointer (channel);
for(int i=0; i<buffer.getNumSamples(); i++)
{
//processing…
channelData[i] = GainParam*channelData[i];
float temp = abs (channelData[i]);

}
}


#2

Maybe this might help.
This splits both channels and then you can do your processing in the while statement. Im doing this from my phone so it could be a crazy format.

const int totalNumChannels = getTotalNumInputChannels();

if(totalNumChannels == 2){

float* leftSamples = buffer.getWriterPointer(0);

float* rightSamples = buffer.getWriterPointer(1);

int numSamples = buffer.getNumSamples();

while(numSamples > 0)
{
float sampleValuesLeft= *leftSamples;

float sampleValuesRight =*rightSamples
numSamples–;
}

}


#3

I seem to remember that to use the SSE features really effectively it helps to have your channel data in an interleaved format.

You can process a single channel effectively with FloatVectorOperations but not if you have time-varying parameters you’d like to apply smoothly. And unfortunately time-varying parameters are usually wanted.

Anyway - for what you have there you can apply the gain with:
FloatVectorOperations::multiply(buffer.getChannel(channel), GainParam, numSamples);

But as soon as you try to smooth the gain changes you’ll see what the problem is.

Also, if the next operation is with temp = abs(channelData[i]) then you’ll have a performance hit here I expect as you’ll need to reload the channelData[i] from the cache, whereas in your existing loop it’ll already be in a processor register from the previous line of code.


#4

I don’t get it, Why cant we do something like:

StereoData = Gain*BothChannels[i]; //Up to 4 channels with SSE

And JUCE would take care of the rest. Which is processing each channel with SSE simultaneously. This is how its done in FlowStone.


#5

I have a simple compressor code, FL Studio shows 4%, i have the same code build with flowstone and its only 2%. I didn’t expect that. I created this topic to get some ideas how you’re doing dsp stuff to keep them cpu friendly.
In my case i have an envelope follower and few more lines which are being processed for each sample in the buffer.
Waiting for some insights. :slight_smile:


#6

I assume you’re in release mode with the optimisations on?

Have you tried putting the profiler on your code?


#7

Thanks for the reply. Yes I’m testing the release build, CPU usage is much lower than debug build. I haven’t used the profiler, I’m guessing that helps with finding the CPU hungry parts? let me see how it works before asking stupid questions.


#8

Definitely get into the profiler … takes a tiny bit of getting used to but is the number one tool for figuring out performance stuff!