Something like this should be part of the JUCE Library, it makes all other denormal workarounds needless.
It activates the Flush To Zero(FTZ) and Denormals Are Zero (DAZ) mode for x86 & x64
At least the code should generated with SSE2 instructions for floating point operations (which is standard in VS2012 & XCODE)
Not sure if there is something similar for ARM.
I did performance-tests with 32/64 bit Xcode & VS2012 and it works! (at least on my intel-processors)
Just add "ScopedNoDenormals nsd;" at the top of the processBlock() in your audio plugin.
#include "xmmintrin.h"
class ScopedNoDenormals
{
public:
ScopedNoDenormals()
{
//There is also C99 way of doing this, but its not widely supported: fesetenv(...)
oldMXCSR = _mm_getcsr(); /*read the old MXCSR setting */ \
int newMXCSR = oldMXCSR | 0x8040; /* set DAZ and FZ bits */ \
_mm_setcsr( newMXCSR); /*write the new MXCSR setting to the MXCSR */
};
FYI I've added a new function for setting these flags: FloatVectorOperations::disableDenormalisedNumberSupport()
(Normally I love RAII stuff, but in this case I can't see much advantage in doing that, since the use-case for this is in an audio thread where you never really want to re-enable the flags.)
In the context of altering the DAZ+FZ policy for Pro Tools audio render threads, they call out problems from existing plug-ins that alter the denormal behavior and don't set it back the way they found it. And they specifically offer a RAII implementation to encourage folks to leave the processor flags in the same state your render call got them.
Just because *many* audio use cases don't want these flags re-enabled doesn't mean all. For example, consider this warning from the OS X SDK:
CAUTION: The math library currently is not architected to do the right thing in the face of DAZ + FZ mode. For example, ceil( +denormal) might return +denormal rather than 1.0 in some versions of MacOS X. In some circumstances this may lead to unexpected application behavior. Use at your own risk.
setting those flags should be a one-time thing, called on the start of the audio thread.
that way all code that runs on that thread will have the special flags.
this is something for the host to do, not the plugin!
it's nasty for plugins to use, specially for hosts that do offline rendering for analysis and other stuff.
I got a couple of warnings in that disableDenormalisedNumberSupport() (+ another one in juce_ZipFile)
juce_FloatVectorOperations.cpp:993:23: Implicit conversion changes signedness: 'unsigned int' to 'const int'
juce_FloatVectorOperations.cpp:994:23: Implicit conversion changes signedness: 'int' to 'unsigned int'
juce_ZipFile.cpp:63:43: Implicit conversion changes signedness: 'unsigned int' to 'const int'
Very interesting stuff, I'm not an expert on this topic, but doing some research I found out there is no guarantee that this flag is available on 32bit CPUs, see here: https://software.intel.com/en-us/node/513376
More problematically: when writing to this flag when not available a general protection exception will be raised, crashing your program if I understand correctly. This can be checked with the instruction fxsave according to http://softpixel.com/~cwright/programming/simd/sse.php
Does anyone have experience both with checking the availability of the flag, and whether cpus not supporting this are still around?
The flags are part of SSE2 so one could probably check SystemStats::hasSSE2() and set the flags accordingly (one of the flags was already available in SSE).
From a quick reading on Wikipedia it seems to be supported in CPUs starting from (and including) Pentium4/Pentium M and Athlon 64. All Intel Macs will support it and I doubt that using current plugins on a machine that is old enough to not support SSE2 will be any fun.
At least the code should generated with SSE2 instructions for floating point operations (which is standard in VS2012 & XCODE)
Does anyone have sources to back this up? Because now it seems it is the compiler that can save us from denormals, or still give them, by using regular float instructions in stead of SSE.
/arch:SSE2 Enables the use of SSE2 instructions. This is the default instruction on x86 platforms if no /arch option is specified.
The optimizer chooses when and how to use the SSE and SSE2 instructions when /arch is specified. It uses SSE and SSE2 instructions for some scalar floating-point computations when it determines that it is faster to use the SSE/SSE2 instructions and registers instead of the x87 floating-point register stack. As a result, your code will actually use a mixture of both x87 and SSE/SSE2 for floating-point computations. Also, with /arch:SSE2, SSE2 instructions can be used for some 64-bit integer operations.
It looks like there is no guarantee, that the compiler choose SSE2 operations for 32bit (64 always uses SSE, because imho it has no x87 fp-set) But i checked the assembler in my case, i found a lot of SSE instructions inside.
(And it solved my denormal issues)
I suggest that @chkn’s ScopedNoDenormals or an equivalent should be part of JUCE and should also be called in the processBlock method of Projucer’s template for audio plugins (extras/Projucer/Source/BinaryData/jucer_AudioPluginFilterTemplate.cpp).
Otherwise what I believe currently happens is that every new plugin developer discovers the issue of denormals when for example a user complains that in REAPER the plugin takes more CPU when playing is stopped, and it takes them time an effort to investigate the issue… This issue may also hit them additional times if they forgot about it when developing new plugins, hence the need for including this call in the plugin template.