An SIMD Question


#1

Hi Everyone,

I’m fairly new to Juce and I’m trying to get my head around the SIMD capability.

I’m trying to do something similar to this:

SIMDRegister<float> * WorkingBuffer = new SIMDRegister<float>[1024];
SIMDRegister<float> ValueOne = SIMDRegister<float>::fromNative({2.0f,2.0f,2.0f,2.0f});
SIMDRegister<float> ValueTwo= SIMDRegister<float>::fromNative({0.0f,0.0f,0.0f,0.0f});
SIMDRegister<float> TempValue= SIMDRegister<float>::fromNative({0.0f,0.0f,0.0f,0.0f});

for(int i = 0; i <; 1024; i++)
{
      WorkingBuffer[i] = SIMDRegister<float>::fromNative({0.0f,0.0f,0.0f,0.0f});
}

for(int i = 0; i <; 1024; ++i)
{
    ValueTwo = TableLookUp[i]; //table is a table of SIMDRegister&lt;float&gt;;

    **//THE FOLLOWING LINE GIVES AN UNEXPECTED RESULT FOR TempValue**

    TempValue= WorkingBuffer[i] * ValueOne - WorkingBuffer[i] * ValueTwo;

…….followed by more code

I don’t understand why I’m getting odd values in the above mentioned line for the TempValue variable. The code works perfectly in the scalar version. I suspect I might be initialising the SIMDRegister<float> array named “WorkingBuffer” incorrectly. Can anyone point me in the direction of a solution? Thanks.

cixelsyd


#2

Are you simply missing an index from the second WorkingBuffer in your last line?


#3

Apologies. That was a typo.

Fixed my post. I’m wondering if the issue could be related to alignment.


#4

I thought that might have been too easy!

I would suggest making a much simpler example, reducing the lines of code and the size of the loops to the absolute minimum to reproduce the problem. By the end of this process I suspect you’ll have worked out what’s going on, but if not you’ll have a fully operational test case you can post here. Guess-debugging other people’s code is much much harder than having something you can experiment with.


#5

LOL. That was the simple version Tom. I’m in the middle of coding a SIMD version of an FFT. It’s been driving me nuts and keeping me awake nights for days. :slight_smile:

My SIMD FFT executes correctly for the first 2 passes but on the third pass there is some real weirdness. I just wanted to confirm I was declaring and initialising my SIMD array correctly.

Thanks
James
Synthetech Sound


#6

Hey t0m,

I found the error and it was indeed my own coding mistake. Thanks for the quick response.

James


#7

Kind of avoiding the issue but why not use a library FFT? It’s a waste of your time writing your own.


#8

I would also suggest something like the free Intel Performance Primitives or the JUCE built-in support for the Intel MKL.


#9

Good Suggestions. I may indeed go with the intel one in the end. I wanted to understand more about the mathematics in the FFT. Now that I’ve done that and have a working SIMD version I don’t feel like it was a waste of time. I suspect once I get into trying to optimise it that will be when I end up making the jump to IPP. I’m also trying to learn the JUCE framework as I go, so this little project helped me to understand SIMD in JUCE. I quite like it. The SIMD version of my FFT is almost identical to the Scalar version. Nice work JUCE team.


#10

Does the JUCE support for Intel MKL include SIMD capability? I got the impression it was scalar.


#11

Yes, both the MKL version and the IPP version utilize different instruction sets depending on your target architecture. They use SSE2, SSE3, SSE4, AVX and AVX-512, depending on what is available.

These are hand-optimized algorithms you will have a hard time beating.

As a learning exercise it makes sense to implement your own, for production I would opt for the Intel stuff every time.

There are other solutions available (FFTW) but their license restrictions may not suit your needs. We’ve inquired about a commercial closed-source license, and we could have bought a small car for their asking price.


#12

I had the same experience checking into FFTW. Gonna check out the JUCE SIMD FFT tonight. Thanks for the recommendations.