FloatVectorOperations optimal size


I’ve seen FloatVectorOperations use SIMD which is a good thing, but I was wondering about the best way to use them.

I have a big chunk of data to process and I want to find the maximum value in the fastest way possible

Should I call FloatVectorOperations::findMaximum with all the data
Should I divide my data in small chunks (which size has to be determined ??), create a thread pool, find the maximum for every chunk, and then find the maximum of all the “local” maximas ?

Obviously, the second solution sounds better as it uses multicore, but I was wondering what’s the best size for the data chunk ?

Should I create many threads with small chunks (*ideally the size of a floatvector but I have no idea of where to find that), or should I create as many threads as there are cores, with bigger chunks of data ?

What’s the best way to take advantage of the SIMD feature vs the cost of creating threads and finding the maximum of all the maximas ?

Generally, working with bigger chunks and using one thread per core is best for churning through a lot of data.

Thanks ! So FloatVectorOperations optimises the SIMD part itself ?

Yes, that’s it’s main purpose!