Any tricks to speed up debug builds that make heavy use of SIMDRegister?

PluginPenguin · December 9, 2021, 6:19pm

The JUCE SIMDRegister is great as it makes it super easy to write SIMD code and use the JUCE processors with SIMD types instead of floats. However, after reworking big parts of our oversampled signal chain which has some FIR filters in the oversampled path to use SIMD, we see a great speedup in release builds and a massive slowdown in debug builds. The profiler shows that the hotspot is the multiplication operator of the SIMDRegister that is not inlined as in release builds but causes three sub-function calls for each multiplication. This is expected, I know, but the slowdown is so heavy that the plugin no longer hits realtime constraints in a debug build, which makes it pretty hard to develop.

So I’m wondering if there are any tricks to guide the compiler to simply fully optimise the calls to the SIMDRegister operators always, no matter what optimisation level is chosen. Some fancy attributes to be wrapped around the SIMDRegister implementation? I also think about solutions like moving e.g. FIR::Filter::processSingleSample to a separate TU and set compiler flags on that file to always compile it with full optimisation?

I already asked for help on Stackoverflow but I would be happy to hear your thoughts on that, I guess I’m not the first one in the JUCE universe facing this problem

PluginPenguin · December 13, 2021, 11:32am

I had some time to try some things out today and what really helped in a first proof-of-concept approach was to add a second cpp file to juce_dsp, specify -O3 for that file and then moved all the internals of the process call along with the processSingleSample function into that TU. With these changes, the plugin is able to play back at realtime again, even in debug builds.

Would you consider to add a solution like that to the dsp module?

kunz · December 14, 2021, 10:33am

I also had some problems a few years back that even the standard Microsoft compiler in the release build could not remove the SIMDRegister wrapper and it did run a lot slower on windows. I switched to clang / LLVM also on windows because of this. Is this still a problem today?

Edit: Today I don’t optimize anymore on this level and let the compiler do it’s work.

PluginPenguin · December 14, 2021, 11:22am

MSVC 2019 has no problems optimising the SIMDRegister calls as expected in a release build. We used it quite a bit in our latest release that really does some heavy processing on oversampled audio and switching to SIMD made a noticeable impact on the CPU usage, especially when processing 5.1 audio.

This is indeed some kind of optimisation that would be pretty impossible for the compiler to figure out on its own, given that you need to re-order the samples in order from the usual per-channel representation to the interleaved SIMD representation. So I’m happy that the SIMDRegister class is there, especially with Apple M1 machines out there, which makes it fairly easy to maintain a cross-architecture compatible implementation.

Topic		Replies	Views
SIMDRegister usage in Debug Audio Plugins	13	1132	September 20, 2022
[DSP module discussion] New class SIMDRegister General JUCE discussion	10	3288	February 21, 2019
SIMDRegister is it worth it? General JUCE discussion	6	2010	November 4, 2022
Using SIMDRegister is slower than regular multiplications ... Any way to perhaps make it useful? General JUCE discussion	4	450	January 23, 2024
My SIMD code barely improves performance General JUCE discussion	4	1162	December 3, 2019

Any tricks to speed up debug builds that make heavy use of SIMDRegister?

Purchase

Discover

Learn

Support

About

Events

Any tricks to speed up debug builds that make heavy use of SIMDRegister?

Related topics

Purchase

Discover

Learn

Support

About

Events