Projucer should have a denormal support flag

The State of the Art Denormal Prevention is to use ScopedNoDenormals to disable denormals during processBlock.

This way a developer can code their DSP normally without worrying about denormals.

However now JUCE has a DSP module, and it also avoids denormals by calling snapToZero. This is an unnecessary waste of CPU when ScopedNoDenormals is used.

To not waste CPU, I suggest the Projucer can have a flag for disabling denormals, and with preprocessor #if the snapToZero calls will be used only when denormals are enabled.

Another simpler solution, is to just always disable denormals for plugin processBlock calls (and also disable them in the main of non-plugins), as probably virtually all developers which use denormals in audio DSP do it by mistake.

6 Likes

Yes, I like that idea. I’ll see if we can add this to JUCE.

1 Like

Sorry to get back to you on this. I completely agree that we need a way to disable the snapToZero code in the dsp module (and elsewhere) - maybe by having a preprocessor #if.

However, after doing a lot of experimentation I’ve found that none of the hosts I have installed have the denormals enabled on the audio thread. So I’m really not sure if an RAII approach is necessary (as Jules writes on the other post). Allegedly, it’s important to restore the denormals flag in ProTools but I can see why this is necessary - the denormals flag is already disabled.

I know for sure that there were reasonable setups where denormals are enabled. I vividly remember that early versions of our products were extremely slow on these setups before we disabled denormals… It might have been mostly on 32-bit windows, maybe it was in REAPER.

OK but even then - is it a bad thing to disable denormals and not revert the setting?

I’d return the setting to what it was to be careful. I wouldn’t want a plugin to change global state…

2 Likes

OK I’ve added the ScopedNoDenormals class to JUCE on develop with commit e2c8e30. I’ve also modified the audio plugin template in the projucer to use this class in the processBlock method.

Additionally, this commit adds a flag called JUCE_DSP_ENABLE_SNAP_TO_ZERO to the dsp module which is enabled by default, but can be disabled if you wish to remove all denormalisation code used in the dsp module’s filters and algorithms.

2 Likes

@fabian,
just chiming in about hosts and denormals.
Last time (a few months ago) we’ve noticed that we needed to turn off denormals in our code was due to users reporting plug-ins suddenly consuming a lot of CPU while idling.

With the JUCE wrappers (as of then… 4.3.1) Audio Unit & AAX seemed to work as expected (and might also be VST3).

However, with VST2 under REAPER builds then at least. it was easily reproducible seeing the software CPU meters (and Activity Monitor / Task Manager) showing spiked usage when idle (processBlock works but gets close to zeros small float numerics).

We also saw that one other hosts. but not all of them.

2 Likes

OK let’s talk about denormals for once and for all :slight_smile:

What’s that beast ? When some DSP code uses any IIR filter in the large sense, you have to deal with things like that (for the sample number n) :

output[n] = gain1 * input [n] + gain2 * input[n-1] + … + gainA * output[n-1] + gainB * output[n-2] + …

Examples : the JUCE classes IIRFilter/dsp::IIR/dsp::StateVariableFIlter, any envelope follower (compressors, synths), anything where the current output is a function of any of the last output values. That means that a FIR filter or a delay line don’t have the problem. Add a feedback to your delay line and you get it.

What problem ? Imagine you have any DSP code processing some input samples != 0. Then you stop the transport bar in you DAW, and the DSP code gets only input samples = 0. You expect the output to become zero then.

Since you have feedback in the signal path, and given time constants in your filter, the output is a function of the last output values, and it is moving slowly from a given not null value to zero. At some point, that output is then going to be very close to zero (around say 1e-12 for a float). That’s where the accuracy of the float type is not enough to deal with a calculus using such a value. So what should be done here ?

By default, the Intel processor enter in a calculus mode which is a lot slower than the normal mode, to deal with these very low values called denormals. This way, there is no loss of accuracy, and the processing code can still calculates the output value with the right accuracy. And this way, the output is very precise, and goes down to 1e-20, 1e-30, and 1e-40, and 1e-50, and… wait ! That’s not what we want. Below a given threshold, what we want in audio is zero, not -600 dB ! What could we do from values like that ?

Now here is the second problem. If you have a plug-in with some processing having a feedback in the signal path, not handling the first problem means that when you click on “stop” on the DAW transport bar, the CPU is going to enter in “denormals mode” at some point to return these nice but useless outputs at -600 dB, and the CPU load is going to increase a lot at that moment ! That’s not good at all. And the duration of the CPU load increase can be very high if the time constants of the processing are long.

So how do we deal with that ? Two solutions :

  • the SnapToZero method, which is basically a detection on some state variables of the denormals, which replaces values very close to zero with zero. It’s like saying “I don’t care about signals below -300 dB in amplitude, if I see one, I’ll replace it with zero”. That’s what I use most of the time in my DSP code.
  • the SSE flags set to the right value to disable handling of denormals. It should be done once at the beginning of the main processing code, every time the process function is called, and everything that has been changed should be put to its initial value after the processing code. The new class that Fabian has added does exactly that !

So now, why have I used the first solution in the DSP module code ? The reason is simple : that’s because it’s compatible with everything (32 bits without SSE, ARM etc.). And the snapToZero functions are called once in every processing block, not once for each sample processed ! So the overhead of the method is not significant at all, and so no one can say that method is slower than the other one…

And I’d like to say that using systematically the second method is also a very good way to deal with denormals without understanding what they are, which is not a good practice for DSP engineers in my opinion. I like to use the first method because it forces me to look for the locations in my code where there is feedback, and put there the snapToZero calls. If i forget one, I get the slow down when the input is zero…

3 Likes

One other reason in favor of the snapToZero approach : what about envelope filters being used for audio visualization purposes, like meters for example ?

You can enable flush-to-zero on ARM as well (and it is probably already enabled by default on most platforms).

If you do this then you don’t solve the problem.
The samples that you process during the block which resulted in denormals will incur the major slowdown.

1 Like

Of course it does solve the problem ! The state variables involved in the feedback are set to zero after one block process, so there isn’t any denormal anymore and the CPU load becomes almost instantly normal without any downside on the audio / CPU etc. after the user clicks on “stop” on the transport bar in the DAW.

If by chance the last sample processed caused the state to be denormal then that would be the case.

But the denormals could happen at any iteration of the loop and all the iterations after that would incur the slowdown.

Well, some additional context is needed here to answer that question properly, since we seem to be talking about different things.

I was talking about the general case of a IIR filter. The denormal issues there happen at one moment, it’s when the user was processing samples in the DAW and then sends only zeroes for a quite long period of time (for example when the button stop of the transport bar is clicked). In that specific case, we get a high increase of the CPU load, and simply adding a snapToZero at the end of the processing block prevents the state variables involved in the feedback topology of the IIR filter to become denormal numbers there, and our issues are gone.

In other contexts, for example with a delay line + feedback, doing the snapToZero only once per block might not be enough because the denormal numbers will still be put in the delay line memory and repeated with the feedback and we get a mess of denormals :slight_smile: Here, the correct solution would be either snapToZero every samples or the register changes (which would be more efficient).

Yes, I’m talking about the same case.

Yes, the issues are gone after the processing of this block. But what happens during this block’s processing?

Suppose someone works with large buffers so they are processing 1024 samples, and let’s say they have 6 surround channels, and luck had it so that the filter resulted in denormal state at around sample 24 of 1024. For simplicity’s sake let’s say it happened at the same time on all channels. In this case the block would process 6000 samples with denormals in its state before you snap the state to zero. The question is just whether you consider this as a problem or if you decide that it’s negligible.

I’d add that even if you consider processing a few thousand samples more slowly negligible, then I could present another example:

Suppose that there’s some plugin that was inserted before your plugin, that just outputs “simulated noise” which with some parameters happens to be mostly zero samples with sporadic very small float values (which could happen because they snap to zero their denormals).

When your plugin processes this signal it happens to hit denormals on almost every block, and even though you snap to zero at the end of the block, most of your processing is still using denormals, causing a huge CPU usage burst, while snapping to zero after every sample would had solved it.

Indeed, most of the time it’s a matter of what is considered to be negligible or not. I think I wouldn’t like to see a DAW session at 100% of CPU load even for 1024 samples, every time I hit stop on the transport bar, I’m sure it could cause glitches in a session with more or less complex 45 plug-ins working with the snapToZero approach !

So, maybe everybody should use the registers approach in the audio processing, using the new class made by Fabian, to solve properly that issue, mostly because computers without SSE2 instructions are not used anymore by potential customers in 2017.

However, I’m still against it in the context of audio visualisation, because it wouldn’t be acceptable for a meter or spectrum analyser to mess with the register values at any random moment.

2 Likes