Interleaved buffers for SIMD processing?

With the release of the dsp module I am considering to rewrite one of my plugins.
It uses a number of parallel delays as resonators.
I imagine that it might improve cpu cache performance as well as SIMD perfomance greatly if the delay lines were interleaved rather than parallel.
Has anyone gone this route already? I’d be interested to know if a measurable perfomance gain can be achieved. Does the juce buffer class support interleaved channels?
(my delay lines a rather small - depending on the samplerate I use a buffer size of 2k, 4k or 8k)

If you have a different delay on each channel, I’m not sure you can do SIMD.
Usually anything with array of structures is slower and not SIMD friendly compared to structure of arrays (so interleaved is worse than non interleaved).

If you have a look in the DSPDemo new demo example application, you’ll see a file called SIMDRegisterDemo.cpp which displays exactly this, the use of interleaved data for SIMD processing. You might find some of your answers there :wink:

1 Like

While the delays have different length, the buffers are of the same size. Since the delay times are variable I assume that either the buffer write or the read can be synchronized as a simd operation but not both. (the delay is interpolated so I would go with the more costly buffer read.)

Thanks, I’ll check that out!

Hi Andreas,

this paper might be of interest:

fixed code examples



1 Like

Hi Oli,
Thanks for the link! That paper and the code look very interesting indeed.
I have already developed my own algorithm for a time variant allpass interpolated delay.
Still I will be very interested to see how Vesa approached the task. Also he has implemented a vectorized solution while mine is scalar.