An SIMD Question

Hi Everyone,

I’m fairly new to Juce and I’m trying to get my head around the SIMD capability.

I’m trying to do something similar to this:

SIMDRegister<float> * WorkingBuffer = new SIMDRegister<float>[1024];
SIMDRegister<float> ValueOne = SIMDRegister<float>::fromNative({2.0f,2.0f,2.0f,2.0f});
SIMDRegister<float> ValueTwo= SIMDRegister<float>::fromNative({0.0f,0.0f,0.0f,0.0f});
SIMDRegister<float> TempValue= SIMDRegister<float>::fromNative({0.0f,0.0f,0.0f,0.0f});

for(int i = 0; i <; 1024; i++)
{
      WorkingBuffer[i] = SIMDRegister<float>::fromNative({0.0f,0.0f,0.0f,0.0f});
}

for(int i = 0; i <; 1024; ++i)
{
    ValueTwo = TableLookUp[i]; //table is a table of SIMDRegister&lt;float&gt;;

    **//THE FOLLOWING LINE GIVES AN UNEXPECTED RESULT FOR TempValue**

    TempValue= WorkingBuffer[i] * ValueOne - WorkingBuffer[i] * ValueTwo;

…….followed by more code

I don’t understand why I’m getting odd values in the above mentioned line for the TempValue variable. The code works perfectly in the scalar version. I suspect I might be initialising the SIMDRegister<float> array named “WorkingBuffer” incorrectly. Can anyone point me in the direction of a solution? Thanks.

cixelsyd

Are you simply missing an index from the second WorkingBuffer in your last line?

Apologies. That was a typo.

Fixed my post. I’m wondering if the issue could be related to alignment.

I thought that might have been too easy!

I would suggest making a much simpler example, reducing the lines of code and the size of the loops to the absolute minimum to reproduce the problem. By the end of this process I suspect you’ll have worked out what’s going on, but if not you’ll have a fully operational test case you can post here. Guess-debugging other people’s code is much much harder than having something you can experiment with.

LOL. That was the simple version Tom. I’m in the middle of coding a SIMD version of an FFT. It’s been driving me nuts and keeping me awake nights for days. :slight_smile:

My SIMD FFT executes correctly for the first 2 passes but on the third pass there is some real weirdness. I just wanted to confirm I was declaring and initialising my SIMD array correctly.

Thanks
James
Synthetech Sound

Hey t0m,

I found the error and it was indeed my own coding mistake. Thanks for the quick response.

James

Kind of avoiding the issue but why not use a library FFT? It’s a waste of your time writing your own.

I would also suggest something like the free Intel Performance Primitives or the JUCE built-in support for the Intel MKL.

Good Suggestions. I may indeed go with the intel one in the end. I wanted to understand more about the mathematics in the FFT. Now that I’ve done that and have a working SIMD version I don’t feel like it was a waste of time. I suspect once I get into trying to optimise it that will be when I end up making the jump to IPP. I’m also trying to learn the JUCE framework as I go, so this little project helped me to understand SIMD in JUCE. I quite like it. The SIMD version of my FFT is almost identical to the Scalar version. Nice work JUCE team.

Does the JUCE support for Intel MKL include SIMD capability? I got the impression it was scalar.

Yes, both the MKL version and the IPP version utilize different instruction sets depending on your target architecture. They use SSE2, SSE3, SSE4, AVX and AVX-512, depending on what is available.

These are hand-optimized algorithms you will have a hard time beating.

As a learning exercise it makes sense to implement your own, for production I would opt for the Intel stuff every time.

There are other solutions available (FFTW) but their license restrictions may not suit your needs. We’ve inquired about a commercial closed-source license, and we could have bought a small car for their asking price.

I had the same experience checking into FFTW. Gonna check out the JUCE SIMD FFT tonight. Thanks for the recommendations.