Double outperforming Floats and other happy accidents

When I started playing around with audio programming. I setup a whole collection of DSP classes and made all there points of entry and the way they process the same. Whilst doing benchmarking I found using doubles throughout actually improved speed by 20 - 30% which didn’t seem right to me for obvious reasons. I was aware of SIMD but thought that was purely a compiler optimisation. Doing some deeper reading into how compiler uses particular opportunities to run 2 for the price of one operations on certain floating point operations and it seems it’s as much to do with with way you code as it is the intelligence of the compiler.

So are there any hard and fast rules to increase the chance for SIMD opportunities?

I now know that the idea is to fill a the SIMD 128bit register with 4 x floats or 2 x doubles (which I now know is where my performance boost came from) and also access memory in a uni-stride way (which I know is a good practise anyway on any platform)

The only good way to increase SIMD opportunities is to manually write SIMD code. You should never rely on the compiler to generate SIMD code, because obviously it can vary wildly across architectures, compilers, and your code’s branching/accumulators, etc.

There are, however, some guidelines you can follow to increase the likelihood of your code being vectorized at compile time. Check out http://www.agner.org/optimize/optimizing_cpp.pdf section 12.3 “Automatic vectorization”.

I agree with Jonathon here.
In my, limited, experience: Yes, the compiler does SIMD optimizations. But my handcoded SIMD code is still faster than what the compiler produces.

About doubles: Hmm… not sure why they were faster for you. If you do proper manual SIMD optimizations, floats should be faster.

Looking at my FIR implementation would align 2 x doubles where as it would be 2 x float which only consumes 64 bit and not qualify for SIMD vectorisation so 20-30% performance increase would make sense.

Thank you both for confirming that I really need to focus on the operations instead of just relying on compiler. I do prefer to be implicit were possible.