I've been trying to do some DSP calculations on Windows using FloatVectorOperations::add. When used on MacOS - it uses Accelerate framework internally and work as a charm. But on Windows - performance is downgraded so much that the code is almost unuseful - it's sth like 5-10 times slower than in Mac.
I dug into FloatVectorOperations code and found it used _mm* functions so I wrote my own code using intrinsics - and it's performance is quite comparable to Mac's. What's strange - having all macros and functions inlined - FloatVectorOperations code is almost as simple as mine - but it's still several times slower than straight _mm* solution.
All tests are for aligned memory, I need just that. I've got JUCE_USE_SSE_INTRINSICS set properly and SSE2 options set in Visual Studio compiler. Testing on 'release' build with fastest optimization.
While checked on 'Debug' and Profiler - it's seems 1/3rd time it spends in "function body" and not in intrinsics functions. My code spends all the time in _mm* functions, leaving < 0.1% of time for "function body" whatever that means in my case.
Tests are for 1 million calculations back and forth on 2048 float vectors.
That's the whole background. My question is - am I missing some JUCE or VS compiler settings to use FloatVectorOperations in a proper way?Have you ever found such problem and got Windows version working as good as Mac one? Maybe there are some settings on Windows I just don't know.
Any help would be greatly appreciated