Why won't this work

    auto sample = simd::fromNative({(float)target->getSample(0, s), (float)target->getSample(1, s)});

// the above didn’t work on 32bit build so I went with the below

	alignas(16) float * asamples = new float[4];
	asamples[0] = (float)target->getSample(0, s);
	asamples[1] = (float)target->getSample(1, s);
	asamples[2] = 0;
	asamples[3] = 0;
	auto sample = simd::fromRawArray(asamples);

And this is barfing the assertion on fromRawArray that it’s unaligned on 32. I think this is affecting a couple of my users/


using simd = dsp::SIMDRegister;

Pretty sure the new there won’t be guaranteed to return aligned memory.

Can you use the stack, rather than the heap?

	alignas(16) float asamples[4];
	asamples[0] = (float)target->getSample(0, s);
	asamples[1] = (float)target->getSample(1, s);
	asamples[2] = 0;
	asamples[3] = 0;
	auto sample = simd::fromRawArray(asamples);

C++17 allow new to be aligned, but prior to that there are various workarounds. e.g.,

template<typename Type,
         unsigned Size,
         unsigned Alignment = alignof(Type)>
struct alignas (Alignment) AlignedArray
    Type data[Size];


    AlignedArray<float, 4, 16> array;
1 Like

Yeah at that point stack allocation is fine. Didn’t know that alignas would only work with stack though. I found the snag and again it was a left hand right hand operator problem with juce SIMD. I do wish these were a little more complete!

Rant with example

using simd = dsp::SIMDRegister<float>;
simd value = simd::fromNative({0.2f,0.3f,0.4f,0.5f});
simd valid = value * 2.0f;
simd invalid = 2.0f * value;

In my case it was even worse as it was a 1 / value;
Q) whats the best way of doing a 1 / simd with juce?

There are SSE and AVX instructions for reciprocal, so those could be useful. But that wouldn’t help for something like 2.0f / simd

I think you should be able to get around the left hand/right hand issue with explicit casting:

    dsp::SIMDRegister<float> values = dsp::SIMDRegister<float>::fromNative({0.2f,0.3f,0.4f,0.5f});
    dsp::SIMDRegister<float> ones = static_cast<dsp::SIMDRegister<float>>(1.0);

    // So this is fine:
    auto test = ones * values;

    // But this still fails...
    auto inv = ones / values;

I guess the thing that’s really missing is the division operator. I don’t know the underlying instruction sets well enough to know whether or not that implementation is feasible…

This doesn’t allocate asamples as begin aligned on 16bits, it says that asamples should. But if the new statement is not aligned itself, then there are no valid assumption on asamples.

When I was playing around creating a little template framework to learn the intel intrinsics I seem to remember I used one like _mm_div_pd so I’m quite sure the devision operator should be there. Or maybe that’s an AVX only thing and that’s why it’s not part of juce.

Intrisincs are not the way to go, have a look at boost::simd or TR2 SIMD. That’s the proper approach (tr2 should probably replace JUCE specific wrappers once compilers support it officially).

I have found XSIMD quite useful (feels like it’s library done the way I would do it or want it done)… But will checkout TR2 now

There are too many libraries IMHO (I’m using libsimdqq at the moment, but will probably migrate to tr2 as soon as possible, the blocker will probably be XCode, as usual).

I prefer to develop on Xcode first but I haven’t braved the Xcode 9 update. Think I will skip altogether and go straight to 10