Vectorization is a huuuuuge topic in DSP and plug-in development, and it is also something very tricky to get right to have significant improvements in speed, which sometimes need a lot of work on the developer side, even if the auto vectorization of modern compilers and some basic rules to follow during developement already help a lot getting good performance in audio algorithms.
I have to confess I don’t know that topic very well myself, and it’s only thanks to @fabian + @jules that I have been able to code successfully a fast convolution algorithm in the dsp::Convolution
class, using the base code in JUCE which helps already the developers to optimize with vectorization their code when possible. For example, I used the functions provided in the class FloatVectorOperations
, and also I made a good use of the class AudioBuffer<float>
which aligns automatically the audio data with the register sizes to allow vectorisation. I had also to reorganize some FFT bins on the fly to be able to get the “four times speed-up”.
But thanks to @fabian, there is also a new way to improve the handling of vectorization and to optimize your code thanks to SSE+AVX operations when available : the new class SIMDRegister
available since JUCE 5.1 in the DSP module.
The beauty of this class is that it is a type, in the same sense that float or double are types. That means that it is possible, when the context is compatible with SIMDRegister
, to do your usual operations (addition, multiplication etc.) automatically using the vectorized registers. And since it is templated, you can put inside an array of float, doubles, even complex numbers if the target machine is compatible with the right set of instructions.
For more information about how to use it, the best thing to do is to have a look in the demo applications, such as the DSP module app demo, or in some of the new classes from the DSP module. You’ll see that the filtering classes are compatible with SIMDRegister
which means you can process 4 channels with filters for the cost of one, thanks to that class !
And I think you can’t imagine how difficult it has been to make the IIR/FIR/StateVariable right during development, since we wanted to provide their new functionalities in the DSP module, but also to use all the stuff around inside (the new Processor API, the wrapper+context concepts, the Duplicator concept, templating everything, and the SIMDRegister class of course). Well, it has been there also specifically difficult for @fabian and @jules, since they were the ones who provided the application of the new Processor concepts (more about them later)
But anyway, once you get how SIMDRegister
work, you can use it to optimize your code with SIMD acceleration quite quickly, but of course only when the context allows it. Which means only when you have operations that you can perform in parallel. For example, recently I did code from scratch a very simple filter class, and thanks to SIMDRegister
I have been able to use 32 of them in parallel, processed by blocks of four. At the end of the day, it’s one of the classes I use nowadays the most from the DSP module, the class AudioBlock
being still the first one.
Tell me what you think of that class, and if you have already used it !