SIMDRegister - feedback and questions

jimc · June 1, 2021, 6:06pm

Just trying to port a bit of DSP to the juce::SIMDRegister stuff to make it easier to do an ARM version.

There’s no _mm_set1_ps(...) in SIMDRegister for __m128 sized floats. Is this an oversight or is _mm_load1_ps equally fast for some reason I don’t understand?
Can we have _mm_cvtepi32_ps() and it’s equivalents added as operations. I’m using it to do some hackery see below for a fast log and exp function.
Has anyone found a way to do CPU dispatching for Intel with the SIMDRegister stuff.
The documentation is completely missing for some relatively non-obvious functions, e.g. swapevenodd

OMG There’s no division at all! Divide by SIMDRegister - #20 by kunz

union Uni {
    __m128 asFloat;
    __m128i asInt; //< i don't think it matters that this is signed/unsigned
};


/**
 * Approximately calculate log2.
 *
 * This uses the approximation:
 *   union { float f; uint32_t i; } vx = { x }; float y = vx.i;
 *   y *= 1.0 / (1 << 23); return y - 126.94269504f;
 */
static __m128 log2 (__m128 x)
{
    const static auto f = _mm_set1_ps (1.0f / float (1 << 23));

    Uni xu;
    xu.asFloat = x;
    __m128 y = _mm_cvtepi32_ps (xu.asInt);
    y = _mm_mul_ps (y, f);

    return _mm_sub_ps (y, _mm_set1_ps (126.94269504f));
}

kunz · June 2, 2021, 6:56am

Same here. I was able to rewrite our DSP code by adding the division operator. But i wasn’t able to translate our SSE exp approximation with the JUCE SIMD classes so far.

Also be carefully when using the JUCE SIMD classes. You maybe have to use the clang compiler also on windows. The standard VS compiler was not able to strip the wrapper away in our case and it was much slower. There is also a forum post around about this.

benvining · June 2, 2021, 7:00am

for SIMD, I don’t really think the Juce SIMD register classes are necessarily the best tool available to us… if you’re on an Apple platform, there is little reason not to use the vDSP library. For everything else, I would recommend MIPP: GitHub - aff3ct/MIPP: MIPP is a portable wrapper for SIMD instructions written in C++11. It supports NEON, SSE, AVX and AVX-512.

It’s completely portable, and features a much richer instruction set than the Juce SIMD classes.

otristan · June 2, 2021, 7:26am

do you know how it compares to xsimd ?

jimc · June 2, 2021, 9:52am

Well we need code that does Windows and Mac so that makes using vDSP awkward.

I think my port to SMIDRegister last night was probably a mistake, although I did get it working it’s a bit of a mess and your note about Windows failing to inline things properly worries me.

This (GitHub - DLTcollab/sse2neon: A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation) looks useful, and very easy! I may try it first and see how it looks

jimc · June 2, 2021, 12:30pm

With MIPP it says the width of the registers is dependent on the architecture. I’m guessing but I can’t find the documenation to confirm it that, on the M1 we will get a 128bit 4 float wide register? If the build architecture supports 256 wide registers will it still build code that’s expecting 4 floats. It’s a bit unclear to me

Topic		Replies	Views
SIMDRegister - How do I do the equivalent of General JUCE discussion	17	3043	June 23, 2018
Divide by SIMDRegister General JUCE discussion	23	2347	September 27, 2021
Simplest way to use SIMD for basic float multiplication/addition? General JUCE discussion	5	635	January 22, 2024
An SIMD Question General JUCE discussion	11	1153	November 7, 2018
Extending SIMDRegister General JUCE discussion	3	762	January 9, 2018

SIMDRegister - feedback and questions

Purchase

Discover

Learn

Support

About

Events

SIMDRegister - feedback and questions

Related topics

Purchase

Discover

Learn

Support

About

Events