Are there any plans to support SIMD style datatypes? e.g. float2 or float4? So e.g. we could write code that reads from two mono sources (left and right) into a float2 and then all mathematical operations on those are automatically SIMD where the hardware allows for it?
I think this would help the compiler a lot, as it’s basically spoon fed a possible optimization.
If the hardware doesn’t support SIMD-style operations, then you can simply do them one at a time. No loss.
Even the best compilers can’t detect the intent and automatically generate fast code. We’ve created our own SIMD class with those datatypes and rewrote tiny portions of some DSP code and the overall performance increase was up to 70% faster than trusting the compiler. We measured hotspots and only optimized the worst offenders that way. The rest still uses regular floats and processes left and right separately, as for those cases we are memory-latency bound.
Even instances where we thought it wouldn’t matter (e.g. high quality sample-playback) we measured very nice improvements.