Hi, i am getting into optimising my algorithms with SIMD using Juce FloatVectorOperations. I want to use the AddWithMultiply function to optimise a FIR filter. But my application keeps throwing an exception:
Exception: Exception 0xc0000409 encountered at address 0x7ffd2e1f992d
I did some searching and found that this error means STATUS_STACK_BUFFER_OVERRUN. I’m a bit confused about why this exception is thrown and how to fix it.
Here is my implementation:
void process(const float* input, float* output, const int N) noexcept
{
float out = 0.0f;
for (int n = 0; n < N; n++) {
#if JUCE_USE_SIMD
juce::FloatVectorOperations::addWithMultiply(&out, state, coefs, numCoefs);
#else
// Convolution
for (int k = 0; k < numCoefs; k++) {
out += state[k] * coefs[k];
}
#endif
updateState();
output[n] = out;
out = 0.0f;
}
}
Also, i aligned my FIR state and coefficients for SIMD:
addWithMultiply(FloatType *dest, const FloatType *src1, const FloatType *src2, CountType num)
Multiplies each source1 value by the corresponding source2 value, then adds it to the destination value.
It works as dest[i] = dest[i] + src1[i] * src2[i]. Therefore, in your case out should also be a array.
BTW, in your case, a single loop might be good enough. juce::dsp::FIR is also using a single loop. See:
That makes sense, though I feel the documentation could be clearer. The current description says that results will be added to the “destination value,” whereas other function descriptions specifically mention a “destination array.”
BTW, in your case, a single loop might be sufficient. The juce::dsp::FIR implementation also uses a single loop.
I read the post, and I understand that using SIMD might not always lead to performance improvements. Could you clarify what you mean by a single loop?
I’m using a single loop for convolution and of course also an outer loop for going over the samples, which is what JUCE::FIR also does