FloatVectorOperation causing stack buffer overrun

roelV · December 18, 2024, 4:57pm

Hi, i am getting into optimising my algorithms with SIMD using Juce FloatVectorOperations. I want to use the AddWithMultiply function to optimise a FIR filter. But my application keeps throwing an exception:

Exception: Exception 0xc0000409 encountered at address 0x7ffd2e1f992d

I did some searching and found that this error means STATUS_STACK_BUFFER_OVERRUN. I’m a bit confused about why this exception is thrown and how to fix it.

Here is my implementation:

void process(const float* input, float* output, const int N) noexcept
{
        float out = 0.0f;

        for (int n = 0; n < N; n++) {

            #if JUCE_USE_SIMD
                juce::FloatVectorOperations::addWithMultiply(&out, state, coefs, numCoefs);
            #else
                // Convolution
                for (int k = 0; k < numCoefs; k++) {
                    out += state[k] * coefs[k];
                }
            #endif

                updateState();
    
                output[n] = out;
                out = 0.0f;
                
        }
    }

Also, i aligned my FIR state and coefficients for SIMD:

    alignas(16) T* coefs { nullptr };
    alignas(16) T* state { nullptr };

Using JUCE 8.0.4 on a Windows 10.
Processor: AMD Ryzen 7 5800H

zsliu98 · December 18, 2024, 6:59pm

addWithMultiply(FloatType *dest, const FloatType *src1, const FloatType *src2, CountType num)
Multiplies each source1 value by the corresponding source2 value, then adds it to the destination value.

It works as dest[i] = dest[i] + src1[i] * src2[i]. Therefore, in your case out should also be a array.

BTW, in your case, a single loop might be good enough. juce::dsp::FIR is also using a single loop. See:

roelV · December 18, 2024, 8:17pm

Thanks for your help!

That makes sense, though I feel the documentation could be clearer. The current description says that results will be added to the “destination value,” whereas other function descriptions specifically mention a “destination array.”

BTW, in your case, a single loop might be sufficient. The juce::dsp::FIR implementation also uses a single loop.

I read the post, and I understand that using SIMD might not always lead to performance improvements. Could you clarify what you mean by a single loop?

I’m using a single loop for convolution and of course also an outer loop for going over the samples, which is what JUCE::FIR also does

roelV · December 18, 2024, 8:51pm

Btw, for anyone interested. Using the FloatVectorOperation was about 2% slower then using a regular for loop.

zsliu98 · December 18, 2024, 10:17pm

Yes, I was talking about single loop per sample, which is exactly what you have done.

Topic		Replies	Views
No performance improvement with FloatVectorOperations General JUCE discussion	42	4968	March 12, 2024
FloatVectorOperations General JUCE discussion	39	3326	June 23, 2015
FloatVectorOperations::add General JUCE discussion	21	1174	July 26, 2022
An SIMD Question General JUCE discussion	11	1210	November 7, 2018
AudioDeviceManager::audioDeviceIOCallbackInt optimization General JUCE discussion	0	284	August 13, 2020

FloatVectorOperation causing stack buffer overrun

Purchase

Discover

Learn

Support

About

Events

FloatVectorOperation causing stack buffer overrun

Related topics

Purchase

Discover

Learn

Support

About

Events