SSE optimization

Hi guys is there anyone out there who knows any good resources on SSE optimization in the context of DSP? I’m trying to do some SSE operations on 16 byte aligned audio buffers but I was wondering whether there are some generally accepted approaches instead of trying to reinvent the wheel.

Have you looked at the JUCE FloatVectorOperations class?

I just made some changes to my plugin in the hopes of speeding up it’s execution. My code was originally developed to execute every sample one at a time through all the routines it needed to do.I changed that to more of a block style…do everything on a buffer at a time instead of a sample at a time. That let me use FloatVectorOperations (which uses whatever SSE code it can per platform, I believe) in quite a few places. After doing that, my plugin was almost 3x faster.

1 Like

Great I didn’t even know JUCE had these, amazing. So if I understand correctly its best to call these functions with multiples of 4 floats? Are there also helpers to allocate aligned memory?

If I’m not mistaken, Jules already thought of that. I believe all of the memory allocation functions in JUCE already take alignment into consideration.