AudioSampleBuffer ideas


#1

Hi there,

would be cool to have a templatized AudioSampleBuffer<> to use doubles to take advantage of copy/ramping functionality of the class on the new datatype too. Also i think that there should be a way to specify byte alignment of the HeapBlock in AudioSampleBuffer so one can specify 16 or 32 bytes alignment and let the compiler automatic handle sse/avx optimizations on operations...

Jules, what do you think ?


#2

i have also some ideas

 

- multiply method to multiplie all values, with another AudioSampleBuffer (with SSE)

- virtual destructor (to be able to use dynamic C++ features)

- an accumulate function, to sum all values (this is also very helpful to detect NANs in the buffer)

Here some very fast SSE3 code inspired by http://fastcpp.blogspot.de/2011/04/how-to-process-stl-vector-using-sse.html (with Non-SSE fallback)

float accumulate( const float *p, int N)

{

  

    if (isSSE3Available())

        

    {

    

        __m128 mmSum = _mm_setzero_ps();

        int i = 0;

        

        

        int rounddown = N - ( N % 16 );

        for(; i < rounddown ; i+=16)

        {

            __m128 v0 = _mm_loadu_ps(p + i + 0);

            __m128 v1 = _mm_loadu_ps(p + i + 4);

            __m128 s01 = _mm_add_ps(v0, v1);

            

            __m128 v2 = _mm_loadu_ps(p + i + 8);

            __m128 v3 = _mm_loadu_ps(p + i + 12);

            __m128 s23 = _mm_add_ps(v2, v3);

            

            mmSum = _mm_add_ps(mmSum, s01);

            mmSum = _mm_add_ps(mmSum, s23);

        }

        

        // add up single values until all elements are covered

        for(; i < N; i++)

        {

            mmSum = _mm_add_ss(mmSum, _mm_load_ss(p + i));

        }

        

        // add up the four float values from mmSum into a single value and return

        mmSum = _mm_hadd_ps(mmSum, mmSum);

        mmSum = _mm_hadd_ps(mmSum, mmSum);

        return _mm_cvtss_f32(mmSum);

    }

    else

    {

        

        float s=0;

        

        const float* e = p + N;

        

        while (p!=e)

        {

            s+=*p;

            p++;

        }

        return s;

        

    };

        

}


bool AudioSampleBufferTools::containsNANs( AudioSampleBuffer & asb )

{

    for (int c=0; c<asb.getNumChannels(); c++)

    {

        

        

        float v=accumulate(asb.getReadPointer(c),asb.getNumSamples());

        

        

        

        

        if (!(v==v))

        {

            return true;

        };

    };

    return false;

    

    

}

Request: FloatVectorOperations::sum()?
#3

- methods to perform other mathematical operations on AudioSampleBuffer (log, exp), applying windows (blackmann), fft etc...


#4

i thought also about having a pluggable LowLevelAudioSampleBufferContext class that will implement low level copy, assign, multiply and other operations on the buffer, for example in commercial programs you would want to have a IPP backend to speed up things (this will require 32byte alignment of buffers to take advantage of avx and avx2 on modern archs).

#define JUCE_CHECK_IPP_STATUS(Func) \
    { IppStatus __ippStatus = Func; jassert(__ippStatus == ippStsNoErr); }

class IPPLowLevelAudioSampleBufferContext : public LowLevelAudioSampleBufferContext
{
public:

    void copyBuffer(const float* source, float* destination, int numberOfSamples) override
    {
        JUCE_CHECK_IPP_STATUS(ippsCopy_32f(source, destination, numberOfSamples));
    }

    ...
};


// example implementation
IPPLowLevelAudioSampleBufferContext ippContext;
AudioSampleBuffer<float> sourceData(2, 1024, &ippContext);
AudioSampleBuffer<float> destData(2, 1024, &ippContext);

destData.copyFrom(sourceData);

 


#5

As mentioned elsewhere, I'm planning on extending it so that the buffer can use floats, doubles or ints internally, and can swap between these internal representations on demand. Doing it that way will mean that existing code won't need to be changed, and will often be able to take advantage of using doubles without needing to be updated.

(Using a template would mean that all existing code would get broken, and it'd make it much harder to write functions that can handle either floats or doubles, as all your functions would also need to be templated)

The buffers are already 8-byte (or maybe 16-byte?) aligned. AFAIK all vector operations work fine like this, so I'm not sure why anyone would ever want it to be 32-byte aligned?


#6

from intel avx manual:

Intel® AVX has relaxed some memory alignment requirements, so now Intel AVX by default allows unaligned access; however, this access may come at a performance slowdown, so the old rule of designing your data to be memory aligned is still good practice (16-byte aligned for 128-bit access and 32-byte aligned for 256-bit access). The main exceptions are the VEX-extended versions of the SSE instructions that explicitly required memory-aligned data: These instructions still require aligned data.

anyway to be sure buffers are aligned they should be allocated with _aligned_malloc, aligned_alloc, posix_memalign 


#7

- an internal memory cache (shared along the process, threadsafe) , to reuse once allocated memory ( or preserve memory for this reason ) to fast usage inside audio callbacks (without OS malloc callback) when working with stacked AudioSampleBuffers in audio-callbacks


#8

you can use this code:

 

class AudioFifo

{

public:

    AudioFifo( int numChannels,int size)  : abstractFifo (size)

    {

        asb=new AudioSampleBuffer(numChannels,size);

    }


    bool canWriteBufferSize(const int size)

    {

        return abstractFifo.getFreeSpace()>=size;

    }


    bool canReadBufferSize(const int size)

    {

        return abstractFifo.getNumReady()>=size;

    }


    void addToFifo (

        AudioSampleBuffer &buffer)

    {

        if (buffer.getNumChannels()==asb->getNumChannels())

        {

            int start1, size1, start2, size2;


            abstractFifo.prepareToWrite (buffer.getNumSamples(), start1, size1, start2, size2);


            jassert(size1+size2==buffer.getNumSamples());


            if (size1 > 0)

                for (int ch=0 ; ch<buffer.getNumChannels(); ch++)

                    asb->copyFrom(ch,start1,buffer,ch,0,size1);


            if (size2 > 0)

                for (int ch=0 ; ch<buffer.getNumChannels(); ch++)

                    asb->copyFrom(ch,start2,buffer,ch,size1,size2);            


            abstractFifo.finishedWrite (size1 + size2);

        } else jassertfalse;

    }


    void readFromFifo (AudioSampleBuffer& buffer)

    {

        if (buffer.getNumChannels()==asb->getNumChannels())

        {


            int start1, size1, start2, size2;

            abstractFifo.prepareToRead (buffer.getNumSamples(), start1, size1, start2, size2);


            jassert(size1+size2==buffer.getNumSamples());


            if (size1 > 0)

                for (int ch=0 ; ch<buffer.getNumChannels(); ch++)

                    buffer.copyFrom(ch,0,*asb,ch,start1,size1);


            if (size2 > 0)

                for (int ch=0 ; ch<buffer.getNumChannels(); ch++)

                    buffer.copyFrom(ch,size1,*asb,ch,start2,size2);            



    

            abstractFifo.finishedRead (size1 + size2);

        } else jassertfalse;    

    }


private:

    AbstractFifo abstractFifo;

    ScopedPointer<AudioSampleBuffer> asb;


private:

    JUCE_LEAK_DETECTOR(AudioFifo)

};

#9

you can use this code if you want:

void AudioSampleBufferCk::apply_blackman(AudioSampleBuffer &asb)

{

    double fac2=(double)((double)3.141592653589793238462643383279*2./(double)asb.getNumSamples());


    for (int ch=0 ; ch<asb.getNumChannels(); ch++)

    {

        float *f=asb.getWritePointer(ch,0);


        for (int i=0; i<asb.getNumSamples();i++)

        {

            *f=(float)((double)*f*blackman(i*fac2));

            f++;

        };

    };

};


#10

void AudioSampleBufferTools::convertFromStereoToMS( AudioSampleBuffer& buffer,int channel1, int channel2 )

{

    float* buf1=buffer.getWritePointer(channel1,0);

    float* buf2=buffer.getWritePointer(channel2,0);

    float* end=&buf1[buffer.getNumSamples()];


    while (buf1!=end)

    {

        float mid=(*buf1+*buf2)*0.5f;

        float side=(*buf1-*buf2)*0.5f;


        *buf2=side;

        *buf1=mid;

        buf1++;

        buf2++;

    }

}


void AudioSampleBufferTools::convertFromMSToStereo( AudioSampleBuffer& buffer,int channel1, int channel2 )

{

    float* buf1=buffer.getWritePointer(channel1,0);

    float* buf2=buffer.getWritePointer(channel2,0);

    float* end=&buf1[buffer.getNumSamples()];


    while (buf1!=end)

    {

        float left=*buf1+*buf2;

        float right=*buf1-*buf2;

        *buf1=left;

        *buf2=right;

        buf1++;

        buf2++;

    }

}

#11

I would prefer to pre-calculate window vectors and apply them in simple vector multiplications - and re-initialize them when the target buffer size is changed (e.g. FFT vector).


#12

i think this is not a either/or question 

well this is are just basic helper functions, which can be applied on a single buffer, as addition to AudioSampleBuffer

You could used such methods to get this behavior ( fill Buffer with 1, apply window , reusing and apply vector multiplication),  i think the functionaly you mention needs to be implemented in a derived/or another class 


#13

I think it's possible to use templated code without breaking existing code.

template <typename T> class AudioSampleBufferT { ... };

class AudioSampleBuffer : public AudioSampleBufferT<float> { ... };

A simple typedef alone would however not do the trick.

For classes consuming AudioSampleBuffers, you can just add templated overloads that can coexist with the "default" float implementation.

 


#14

+1 for templatized AudioSampleBuffer<> :-)