FloatVectorOperations

Just added a new class: FloatVectorOperations

I needed some accelerated primitives for tracktion, so did a nice class to wrap up some simple SSE2 ops, and used them to speed up various bits of audio buffer code. Testing + suggestions welcome!

Thanks Jules! I just noticed those had been checked in. I wonder if you might add a routine to fill a vector with a constant. Lots of times we’ve got to fight denormalization errors, and we’ll typically clear buffers by setting them to a small constant (inaudible) value. That often keeps plugins from pigging out on CPU time.

But it looks like a helpful class. I’m sure you have many suggestions for additions.

Too excited to try this out!

Just a note, it doesn’t build in 64-bit:

(Plus an x64 warning)

Good idea - I could use that myself, actually. What value do you use? I was thinking of trying alternating small +ve/-ve values, so it remains centred at 0.

Good idea - I could use that myself, actually. What value do you use? I was thinking of trying alternating small +ve/-ve values, so it remains centred at 0.[/quote]
Much of the time I use 0.000000045f, but there’s nothing unique about it. I’d suggest simply making the value an argument to the function. I’d avoid alternating values because they do introduce a frequency component. It’s very rare that these constants would actually add up to enough DC to be problematic, but that depends on the algorithm. Any sort of IIR operation can easily do a sign flip if it matters.

EDIT: I’d also request a function to add a constant to a vector. Lots of times you don’t know if the vector has any values or not, so you’d do this as an occasional safety measure.

Thanks - fixed the 64-bit stuff, and added a fill function now…

Nice addition, but what about ARM/NEON?

I have been looking for good, simple to include, simd wrapper libraries and this one is suberb:

IMHO on OSX and IOS, you should implement those using vDSP when there is an equivalent

copy --> vDSP_mmov
copyWithMultiply --> vDSP_vmul
add --> vDSP_vsadd and vDSP_vadd
multiply --> vDSP_vsmul and vDSP_vmul
clear --> vDSP_vclr

[quote=“otristan”]IMHO on OSX and IOS, you should implement those using vDSP when there is an equivalent

copy → vDSP_mmov
copyWithMultiply → vDSP_vmul
add → vDSP_vsadd and vDSP_vadd
multiply → vDSP_vsmul and vDSP_vmul
clear → vDSP_vclr[/quote]

Yes, that was on my to-do list. Will be there soon…

Very good idea !

Anyone else having problems building an RTAS plug-in on Windows since these were added? When building the JuceDemoPlugin with Visual Studio 2010 and the Pro Tools 8 SDK, I get the following linker errors:

juce_RTAS_Wrapper.obj : error LNK2001: unresolved external symbol "public: static void __stdcall juce::FloatVectorOperations::copy(float *,float const *,int)" (?copy@FloatVectorOperations@juce@@SGXPAMPBMH@Z) juce_RTAS_Wrapper.obj : error LNK2001: unresolved external symbol "public: static void __stdcall juce::FloatVectorOperations::clear(float *,int)" (?clear@FloatVectorOperations@juce@@SGXPAMH@Z)

Try resaving your project with the introjucer - it may need to include the new source files.

I did do that; no luck, I’m afraid. The juce_FloatVectorOperations.cpp is being included in juce_audio_basics.cpp, but the linker can’t seem to find it.

Regarding denormalization, you can get that for free when using SSE. If the FTZ (flush to zero) and DNZ (denormals are zero) bits are set in the SSE control register the CPU won’t bother with denormals. One of those bits came with SSE2, not sure how this behaves with SSE.

I’m using the followin stuff to avoid denormalisation in PitchedDelay:

class ScopedSSECSR
{
public:
        ScopedSSECSR()
                : csr(_mm_getcsr())
        {
                // sets FTZ & DNZ
                _mm_setcsr(csr | 0x8040);
        }

        ~ScopedSSECSR()
        {
                // resets control register
                _mm_setcsr(csr);
        }
private:
        const unsigned int csr;
        JUCE_DECLARE_NON_COPYABLE_WITH_LEAK_DETECTOR(ScopedSSECSR);
};

...

void processBlock(AudioSampleBuffer& buffer, ...)
{
        ScopedSSECSR csr;
        // processing stuff
}

[quote=“lkjb”]Regarding denormalization, you can get that for free when using SSE. If the FTZ (flush to zero) and DNZ (denormals are zero) bits are set in the SSE control register the CPU won’t bother with denormals. One of those bits came with SSE2, not sure how this behaves with SSE.

I’m using the followin stuff to avoid denormalisation in PitchedDelay:

[code]
class ScopedSSECSR
{
public:
ScopedSSECSR()
: csr(_mm_getcsr())
{
// sets FTZ & DNZ
_mm_setcsr(csr | 0x8040);
}

    ~ScopedSSECSR()
    {
            // resets control register
            _mm_setcsr(csr);
    }

private:
const unsigned int csr;
JUCE_DECLARE_NON_COPYABLE_WITH_LEAK_DETECTOR(ScopedSSECSR);
};

…

void processBlock(AudioSampleBuffer& buffer, …)
{
ScopedSSECSR csr;
// processing stuff
}
[/code][/quote]

That’s a nice trick. So if any denormalised numbers in the input data always get flushed to zero?

[quote=“jules”][quote=“lkjb”]If the FTZ (flush to zero) and DNZ (denormals are zero) bits are set in the SSE control register the CPU won’t bother with denormals. One of those bits came with SSE2, not sure how this behaves with SSE.
[/quote][/quote]

I was just scrounging around the Mac developer site and found this procedure recommended. It sounds like a dumb question, but do you know if that register is saved in the thread context? If I decided to turn it off for an extended series of operations, can I be certain it doesn’t get whacked if I’m swapped out of the CPU for a bit?

Good question. And if it does stay constant for a thread, then maybe a smart thing to do would be to just leave it permanently enabled for your audio thread?

[quote] One of those bits came with SSE2, not sure how this behaves with SSE.
[/quote]
you get a crash when setting this bit.

those bits are saved in the thread context, just like the x87 fpu state – as a consequence they are also shared by the plugins who happen to be run on the same audio callback, this can have (not so) funny consequences when one of the plugins does nasty things with them

So it sounds like you’re OK if you set it up when you enter your process and restore it when you leave.

I’ve just tried to build a Windows RTAS plugin (an operation I’ve done many times). I had a link failure with missing symbols FloatVectorOperations::copy and FloatVectorOperations::clear. Is there something that should have been included in one of your modules?