Just trying to port a bit of DSP to the juce::SIMDRegister stuff to make it easier to do an ARM version.

There’s no
_mm_set1_ps(...)
in SIMDRegister for__m128
sized floats. Is this an oversight or is _mm_load1_ps equally fast for some reason I don’t understand? 
Can we have
_mm_cvtepi32_ps()
and it’s equivalents added as operations. I’m using it to do some hackery see below for a fast log and exp function. 
Has anyone found a way to do CPU dispatching for Intel with the SIMDRegister stuff.

The documentation is completely missing for some relatively nonobvious functions, e.g. swapevenodd

OMG There’s no division at all! Divide by SIMDRegister  #20 by kunz
union Uni { __m128 asFloat; __m128i asInt; //< i don't think it matters that this is signed/unsigned }; /** * Approximately calculate log2. * * This uses the approximation: * union { float f; uint32_t i; } vx = { x }; float y = vx.i; * y *= 1.0 / (1 << 23); return y  126.94269504f; */ static __m128 log2 (__m128 x) { const static auto f = _mm_set1_ps (1.0f / float (1 << 23)); Uni xu; xu.asFloat = x; __m128 y = _mm_cvtepi32_ps (xu.asInt); y = _mm_mul_ps (y, f); return _mm_sub_ps (y, _mm_set1_ps (126.94269504f)); }