Thanks. I didn’t realize SIMDRegister like that has 4 floats in it. Based on your tip, I found a simple example of how to use it here:
I will try that soon.
The funny thing about all this is it requires you to:
- create a new (aligned) array of the primitives (floats)
- put that array into a new class to get it into the SIMD format.
- do this twice if you need to multiply two sets of data
- multiply the SIMD types (only step saving operations)
- create a new (aligned) array to copy the data to or run a get function for each
It seems surprising to me this is any better than just multiplying but I guess multiplying is still much more expensive than shuffling all these primitives around.
Also, in the example given, is the alignas (16) necessary? Eg.
alignas (16) float eraw[4];
u.copyToRawArray (eraw);
What would happen if you didn’t have the alignas? Would it just fail?
I notice Jules’ code skips all this completely. Is it likely less/more or the same efficiency to use:
float val0 = u.get(0);
float val1 = u.get(1);
float val2 = u.get(2);
float val3 = u.get(3);
Thanks for any further thoughts.
