FloatVectorOperations crash



Hey Jules -


I love the new FloatVectorOperations class.  I've been adding them everywhere I can lately.

However, I've just noticed that in a very particular circumstances they break. 


Specifically - Win32 Release mode plugins.  

Seriously - they operate fine on mac, or Win 64 bit, or in debug mode, in in any standalone program .... but not in Release mode, 32 bit, Windows. plugins.


You can replicate this just by doing a AudioSampleBuffer::addFrom() in the processBlock of the Juce plugin example (and the FloatVectorOperations::add inside there craps out).  copyFrom works fine (FloatVecotrOperations::copy seems ok).


It's just the FloatVectorOperations::add and  FloatVectorOperations::multiply that have issues (so really I think it must be something about the _mm_add_ps call in:


  JUCE_PERFORM_SSE_OP_SRC_DEST (dest[i] += src[i],
                                  _mm_add_ps (d, s),
                                  JUCE_LOAD_SRC_DEST, JUCE_INCREMENT_SRC_DEST)



Just for reference ... here is the code I'm using in the plugin example:


void JuceDemoPluginAudioProcessor::processBlock (AudioSampleBuffer& buffer, MidiBuffer& midiMessages)
    const int numSamples = buffer.getNumSamples();
    int channel, dp = 0;
    // Go through the incoming data, and apply our gain to it...
    for (channel = 0; channel < getNumInputChannels(); ++channel)
        buffer.applyGain (channel, 0, buffer.getNumSamples(), gain);

    AudioSampleBuffer temp(buffer.getNumChannels(), numSamples);
   // CAUSES a CRASH .... in win32 release mode
    for (short chan=0; chan<buffer.getNumChannels(); chan++)    
        buffer.addFrom(chan, 0, temp.getSampleData(chan), numSamples);


Ok - take a look when you get a minute.

Presumably this could only be because the host is changing some kind of CPU floating point mode setting that breaks some SSE operations... My knowledge of floating point modes is pretty shallow - anybody know what this might be?

probably not related , but I see that _mm_empty() is used. It is only to be used when you're mixing mmx instructions with fpu instructions -- Since you are requiring SSE2 instructions , mmx stuff is not used (it was only relevant before SSE2) and there is no point in using _mm_empty().

Ah! I didn't know that - thanks, I'll remove it!


Jules - 

I just checked in on this one and it still seems to be an issue.

This is no trivial matter, as I'm sure most plugins out there use addFrom() or clear() even if they don't use FloatVectorOperations::multiply directly.


It should be fairly easy to fix though (just putting a preprossessor switch in there that uses an iterator for any win32 plugin should do the trick, right?).




I can't just treat the symptoms like that without actually knowing what's causing it.

Could it be because you're turning on some of the more extreme floating-point optimisation modes in the compiler? I know that if you start enabling the non-IEEE floating point compiler flags in MSVC it can lead to some pretty strange and unexpected bugs.


Doubtful.  You can replicate the issue by adding an AddFrom() to the juce plugin example (and compiling for win32, Relase mode).


I'll go through and try it with some changes to the settings though.  Since it happens in debug and not release, it certaibly could be an optimization setting.

Sorry, I've not looked at this yet..

But which host are you using? Have you tried in something like the juce demo host, where we can be sure that it's not mucking-around with the CPU mode flags?


Yup.  Got it.

Looks like turning off whole program optimization (VS2010 -> Project Properties -> C/C++ -> Optimization ) does the trick.


It has something to do with SSE_INTRINISICS alignment specifications.


Take a look at this:



You may want to turn off the optimization on the plugin demo code too.



Hmm.. I'm a little dubious about alignment being the issue - the way the SSE stuff is written, it'll be faster with aligned memory, but will work just fine with non-aligned data too. The only other thing that could be misaligned are the local __m128 variables, but that data-type is hard-wired to be 16-byte aligned, so even if there was a compiler bug that was messing it up, then there's nothing much we could do to the code to fix it.

Which version of VS are you using? It could actually be that this is a compiler bug which they've fixed.



VS 2010 

I'll check for some updates.

And yeah, if it were alignment, you would also expect it to break on standalones (and not just the plugins).

It works fine with 64 bit plugins too.

Only an issue for 32 bit Release plugins with whole program optimization enabled .... very strange.


I wonder if the problem is because when a host makes a call into the plugin's process function, the *host* could sometimes do so with the stack in an unaligned state. Normally, there'll be code inside functions which checks the stack alignment and corrects it, but with WPO, the optimiser may have decided that these functions can only ever be called from places where the stack pointer is already aligned, so there's no need to bother checking it.. but obviously that wouldn't take into account the fact that it can also be called from an external app..

If that's the case, then it might be possible to fix by adding some kind of compiler-specific hack in the audio processing base method to force it to always fix the stack alignment before doing the plugin processing.


Woah ... deep shit man.  Sounds possible though.

Happy to test it out if you have some tweaks you want to try.

TBH I have no idea how to force the compiler to do that kind of thing. Like you say, this is seriously deep compiler stuff, and I've never dug that deeply into how stack alignment works.