How to design a performant sample buffer class


#1

I know that JUCE has the AudioBuffer class, however I need a class with slightly more options for my application. Core requirements are

  • It needs to be able to deal with complex valued samples as well as real valued samples
  • For some special embedded FPGA-based targets, some shared memory between CPU and FPGA should be allocated through the buffer class instead of normal memory allocation, furthermore memory access through raw pointers needs some additional calls in this case

So in the end I would end up with at least with four different class variations (Real valued, standard memory; complex valued, standard memory; real valued, shared memory; complex valued, shared memory).

My first approach would be to create a pure virtual interface class and a subclass for all four variations. This way all DSP functions I would write just need to implement access through the base class member functions. However I’m not sure if this approach would prevent the compiler from perfect optimization such as inlining etc. if all functions such as the typical getReadPointer, etc. would be virtual functions of the base class (as the actual class is evaluated at runtime, isn’t it?).

Any experts out here that could share some insight on how the performance impact of such a design could be or maybe if I shouldn’t bother at all because the impact is negligible? Or any good idea for a better design?


#2

I feel like templates are your solution here since you’re going to target embedded and ideally all the function call nuance should be resolved at compile time to allow for better inlining/optimization. You can use STL containers for the underlying storage, like std::vector and just provide the audio buffer methods in the template.

To deal with memory allocation, you can provide your own allocator as a template parameter. IE

template<class DataType, class Allocator = std::allocator> // default to the system allocator
struct AudioBuffer
{
    // provide your constructors, getters/setters, clearing function, etc

    std::vector <DataType, Allocator> data; // the underlying data can be stored in a vector
};

#3

Yeah, I thought about templates too. Especially the usage of a custom allocator in this case is interesting. However, I think this could conflict with another design goal I had in mind.

Short explanation: In fact, I want to process RF data, not audio, which will not make a big difference in this context as DSP code used for that purpose has a lot of similarities to audio DSP code. However, a big difference is that some hardware frontends will produce complex valued samples (through an IQ mixer) while others will produce only real valued samples. All kinds of frontends should be abstracted by some RFFrontendEngine class. Furthermore some of the dsp algorithms could be implemented for both, processing complex valued samples as well as processing real valued samples. In the end, the code I dream of would look somewhat like that:

class RFHardwareFrontendEngine
{
public:
    class ProcessingCallback
    {
        virtual void processSampleBlock (RFSampleBuffer& sampleBuffer) = 0;

        // and all the stuff like prepareToPlay etc. as you know it with audio interfaces
    }

    // member functions to select and configure a hardware frontend

    void addProcessingCallback (ProcessingCallback* callbackToAdd)

};


class MyProcessingCallback : public RFHardwareFrontendEngine::ProcessingCallback
{
public:
    void processSampleBlock (RFSampleBuffer& sampleBuffer) override
    {
        // would be great if my::FFT could somehow use something like template specialization
        // internal to deal differently with real valued samples and complex valued samples
        myFFT.processSamples (sampleBuffer);
        
        // do some more fancy dsp stuff...
    }

private:
    // for an example some FFT class suited to my needs
    my::FFT myFFT;
};

I’m aware of the fact that the code above won’t work that way if my sample buffer class is templated. While my dsp classes could take templated arguments for the sample buffers passed in and could be optimized to each possibility through specialization internally if needed while retaining a clean interface from the outside I can’t really see a possibility of how to achieve the desired interface when it comes to the processing callback. The only thing that comes to my mind would be supplying four different callback functions for each combination of template parameters, however that would be very contrary to the goal of designing a clean interface.


#4

Can’t you do all of this with juce::dsp::AudioBlock? Its whole purpose is to be a wrapper for data that is allocated by something else, and it allows the sample type to be anything you need, including complex types…


#5

Not sure how dsp::AudioBlock would solve my problem of being able to push real valued or complex valued sample blocks to the same callback and let the functions working with the blocks decide how to handle it. As far as my C++ knowledge goes I cannot implement something like that:

class ProcessingCallback
    {
        virtual void processSampleBlock (juce::dsp::AudioBlock& sampleBuffer) = 0;
    }

as there needs to be a template argument specified with the AudioBlock. The only thing I can think of would be to implement something like

class ProcessingCallback
    {
        // will be called if the engine outputs real valued samples        
        virtual void processRealValuedSampleBlock (juce::dsp::AudioBlock<float>& sampleBuffer) = 0;

        // will be called if the engine outputs complex valued samples        
        virtual void processComplexValuedSampleBlock (juce::dsp::AudioBlock<std::complex<float>>& sampleBuffer) = 0;
    }

However that would lead me to need to write my whole DSP code twice while it would most likely look completely the same for both versions. That does feel contrary to DRY principles. However I might be overlooking something obvious here that could solve this in an elegant way?


#6

Furthermore I’d like my buffer class to have some extended capabilities regarding complex math. Would be cool if the interface would look somewhat like that (stripped down example to point out the key aspects, vector operations are not optimized for speed here, no size check for the target buffers, etc…)

class RealValuedRFSampleBuffer;
class ComplexValuedRFSampleBuffer;

class RFSampleBuffer
{
public:
    // ...
    virtual void getRealPart       (RealValuedRFSampleBuffer& bufferToFillWithRealPart) = 0;
    virtual void getImaginaryPart (RealValuedRFSampleBuffer& bufferToFillWithImaginaryPart) = 0;
    virtual void getAbs            (RealValuedRFSampleBuffer& bufferToFillWithAbsoluteValues) = 0;
    // ...
}

class RealValuedRFSampleBuffer : public RFSampleBuffer
{
public:
    // ...
    const float* getReadPointer() {return buffer.data(); }

    void getRealPart (RealValuedRFSampleBuffer& bufferToFillWithRealPart) override
    {
        bufferToFillWithRealPart.buffer = buffer;
    }
    void getImaginaryPart (RealValuedRFSampleBuffer& bufferToFillWithImaginaryPart) override
    {
        auto& bufToFill = bufferToFillWithImaginaryPart.buffer;
        std::fill (bufToFill.begin(), bufToFill.end(), 0.0f);
    }
    void getAbs (RealValuedRFSampleBuffer& bufferToFillWithAbsoluteValues) override
    {
        for (int i = 0; i < buffer.size(); ++i)
            bufferToFillWithAbsoluteValues.buffer[i] = std::abs (buffer[i]);
    }
private:
    std::vector<float> buffer;
}

class ComplexValuedRFSampleBuffer : public RFSampleBuffer
{
public:
    // ...
   const std::complex<float*> getReadPointer() {return buffer.data(); }

    void getRealPart (RealValuedRFSampleBuffer& bufferToFillWithRealPart) override
    {
        for (int i = 0; i < buffer.size(); ++i)
            bufferToFillWithRealPart.buffer[i] = buffer[i].real();
        
    }
    void getImaginaryPart (RealValuedRFSampleBuffer& bufferToFillWithImaginaryPart) override
    {
        for (int i = 0; i < buffer.size(); ++i)
            bufferToFillWithImaginaryPart.buffer[i] = buffer[i].imag();
    }
    void getAbs (RealValuedRFSampleBuffer& bufferToFillWithAbsoluteValues) override
    {
        for (int i = 0; i < buffer.size(); ++i)
            bufferToFillWithAbsoluteValues.buffer[i] = std::abs (buffer[i]);
    }
private:
    std::vector<std::complex<float>> buffer;
}

This way all functions interacting with this buffer could expect mathematically correct results under all circumstances (a bit like how a matlab vector would behave but with the best possible optimization for real-time critical applications), however classes that are especially performance critical and should not copy the data before working on it could be optimized internally to work differently with a complex or real valued buffer and could operate directly on the pointers supplied by the subclasses