Divide by SIMDRegister

finecutbodies · August 12, 2018, 6:49pm

Is there a way to divide by a SIMDRegister? Like can I calculate 1/SIMDReg somehow, or do anybody know about a good workaround?

tytel · August 12, 2018, 6:57pm

There’s div in SSE but not in NEON so that’s probably why it’s not supported in SIMDRegister.
If you’re only building for SSE devices than you could just put in the / operators in SIMDRegister.
That or this will work without changing JUCE code:

SIMDRegister<float> quotient = _mm_div_ps(numerator.value, denominator.value);

finecutbodies · August 13, 2018, 5:25am

thank you, that’s perfect and works nicely!

ncthom · August 13, 2018, 2:09pm

Genuine question (showing my naiveté)– why can’t the SIMDRegister type implement a operator/= which when built against SSE uses _mm_div_ps, and when built against NEON or AVX has a workaround implementation which unpacks the register, performs the division operation for each value, packs the newly acquired values back into the register, then returns?

Obviously, I’m assuming, that would be a performance hit, but if you’re building against NEON, trying to use the SIMDRegister type, and need a 1/x division operation, wouldn’t you have to write something just like this anyway?

holy-city · August 13, 2018, 8:19pm

imho the proper solution would be to support division on SSE platforms and have a compile error on NEON. Trusting a user to know and understand the differences between SIMD platforms kind of defeats the purpose of hiding it behind an API.

ncthom · August 13, 2018, 8:31pm

That’s fair; I would support that solution.

kunz · August 2, 2020, 8:35am

An SSE only solution is not possible anymore for us because of apples ARM switch. How can we proceed rewriting the code without that fundamental operator?

In my opinion it should be there and calculated without registers when not supported by the processor.

Any workarounds or other solutions welcome! Is it possible to calculate the division somehow different. Maybe with 1 / x operator?

kamedin · August 2, 2020, 9:18am

A64 has FDIV and the vdiv intrinsics for floating-point.

kunz · August 2, 2020, 9:34am

how does this help? how can i use this?

kunz · August 2, 2020, 2:31pm

The whole SIMDRegister is useless for us without the division operator or the possibility to calculate the reciprocal (1/x) value and it looks to me that adding this feature and modifying the Module is not an easy and maintainable task.

Are there other ways to do this? For example to calculate the reciprocal with the SIMDRegister and the JUCE helper functions? Or does someone have a complete different solution that works for different CPU’s?

kunz · August 2, 2020, 2:38pm

Is this something that is already supported in most ARM CPU’s?

johngalt91 · August 2, 2020, 2:48pm

ARM’s NEON has floating point division intrinsics vdivq_f32
You can look them here. I don’t know tho if it’s only in ARMv8 or ARMv7 also supports it

kunz · August 2, 2020, 3:29pm

Thanks for the information. 64 bit division seems to be there too. So, it looks almost all processors support this.

@t0m: Can we have the division feature for SIMDRegister? You could throw a compiler error if it’s not supported for the CPU like mentioned above.

kamedin · August 3, 2020, 12:12am

v8 only. It’s in the A64 instruction set.

I’d say yes, though juce_neon_SIMDNativeOps seems to use A32 intrinsics only. It’s weird that division is not implemented for SSE or AVX. I don’t know if the fallbacks are selected per operation or for the whole set -that may be a reason to exclude operations that some sets don’t have, like division in A32.

johngalt91 · August 3, 2020, 1:14am

Take it with a grain of salt, but I recall the reason was NEON didn’t have division by the time they implemented it, so it didn’t make any sense to implement the SIMD division’s wrapper containing only SSE/AVX intrinsics as the purpose of those is to use them and forget about which platform you are coding for.

kamedin · August 3, 2020, 2:50am

It’s clear that the NEON wrapper was made for A32, but there’s a SIMDFallbackOps struct to handle these cases. There could be a SIMDNativeOps::div that calls SIMDFallbackOps::div for NEON. Many things are available in some sets only, like fma, or 256-bit vectors.

kunz · August 3, 2020, 2:56pm

So, it’s time to add the division feature?

kunz · August 4, 2020, 7:50am

I was able to overwrite the division operator for the datatypes i needed without changing the juce library code.
This way i can use all the features of the SIMDRegister struct and the division. I wasn’t able to test the ARM version, but we will see if it works soon

Here is the code:

#pragma once

#include "../JuceLibraryCode/JuceHeader.h"

using vec4 = juce::dsp::SIMDRegister<float>;
using vec2 = juce::dsp::SIMDRegister<double>;

#if defined(__i386__) || defined(__amd64__) || defined(_M_X64) || defined(_X86_) || defined(_M_IX86)
inline vec4 operator / (const vec4 &l, const vec4 &r)
{
    return _mm_div_ps(l.value, r.value);
}

inline vec2 operator / (const vec2 &l, const vec2 &r)
{
    return _mm_div_ps(l.value, r.value);
}

#elif defined(_M_ARM64) || defined (__arm64__) || defined (__aarch64__)
inline vec4 operator / (const vec4 &l, const vec4 &r)
{
    return vdivq_f32(l.value, r.value);
}

inline vec2 operator / (const vec2 &l, const vec2 &r)
{
    return vdivq_f64(l.value, r.value);
}

#else
 #error "SIMD register support not implemented for this platform"
#endif

Still hope that division operator will be added some time. Any input is welcome.

edit: fixed ARM specific code

kamedin · August 4, 2020, 9:19am

I think for ARM it should be vdivq_f32 and vdivq_f64. Plain vdivs work on 64 bits (float32x2, float64x1). Also, they work only for A64, so

#elif defined(_M_ARM64) || defined (__arm64__) || defined (__aarch64__)

kunz · August 5, 2020, 6:13am

Thanks for the fixes!

Topic		Replies	Views
SIMDRegister - feedback and questions General JUCE discussion	5	764	June 2, 2021
FloatVectorOperations Divide General JUCE discussion	11	1465	October 1, 2025
SIMDRegister - How do I do the equivalent of General JUCE discussion	17	3057	June 23, 2018
SIMDRegister: Add native double support for ARM Neon since aarch64 supports it Feature Requests	0	601	January 11, 2022
Simplest way to use SIMD for basic float multiplication/addition? General JUCE discussion	5	656	January 22, 2024

Divide by SIMDRegister

Purchase

Discover

Learn

Support

About

Events

Divide by SIMDRegister

Related topics

Purchase

Discover

Learn

Support

About

Events