How to do FloatVectorOperations on all channels in a buffer?

Probably something all you smart people know.

I can do this;

FloatVectorOperations::clip (tgBuffer.getWritePointer (0), tgBuffer.getWritePointer (0),
                             -1.0f, 1.0f, numSamples);

However my TGBuffer have 16 channels (unison voices), so I have to do this;

for (int u = 0; u < unisonVoices; u++)
{
   FloatVectorOperations::clip (tgBuffer.getWritePointer (u), tgBuffer.getWritePointer (u),
                                -1.0f, 1.0f, numSamples);
}

It works, but is there a way to do in one swift stroke so it does all the channels? I did look into getArrayOfWritePointers but my knowledge of C++ probably prevents me from figuring that out.

You can simply do that to apply a gain on the whole buffer :

tgBuffer.applyGain (multiplier);

applyGain will call FloatVectorOperations::multiply .
(btw, note that it’s good practice to have variable names begining with a lower-case letter)

Ok that was a very bad example I used. I know about using gain and lower case letters, out of every single one of my variables, that was the only one where I did not use lower case letters.

Anyways I actually need FloatVectorOperations for something different than multiplier, like;

FloatVectorOperations::clip (tgBuffer.getWritePointer (0), tgBuffer.getWritePointer (0),
                             -1.0f, 1.0f, numSamples);

You can write a utility function to do something with each channel like this:

    template <typename Element, typename Callback>
    void forEachChannel (juce::AudioBuffer<Element>& buffer, Callback&& callback)
    {
        std::for_each (buffer.getArrayOfWritePointers(),
                       buffer.getArrayOfWritePointers() + buffer.getNumChannels(),
                       std::forward<Callback> (callback));
    }

Then, you can implement clipping very easily:

    template <typename Element>
    void clip (juce::AudioBuffer<Element>& buffer, Element lo, Element hi)
    {
        forEachChannel (buffer, [lo, hi, num = buffer.getNumSamples()] (Element* channel)
        {
            juce::FloatVectorOperations::clip (channel, channel, lo, hi, num);
        });
    }
1 Like

Thanks for chiming in, however I don’t see how that is any different (advantage) from my for loop in original post, as in;

for (int u = 0; u < unisonVoices; u++)
{
   FloatVectorOperations::clip (tgBuffer.getWritePointer (u), tgBuffer.getWritePointer (u),
                                -1.0f, 1.0f, numSamples);
}

I think the second template Callback should be removed from the clip function :wink:

Well this is a bit more generic and would allow you to pass in multiple types of lambdas for all kind of stuff you want to do on all channels. If this is something you do a few times spread across your code, this could tidy up your code.

The question is: What is your goal in doing it different compared to the loop? Better coding style / better code readability or better performance?

Performance wise, a good compiler should probably generate the same code for all solutions (though I’m not sure if the lambda introduced here could even degrade performance…?) When it comes to coding style, this could enhance re-usability but might also degrade readability compared to the simple loop. As I cannot come up with some super neat and super readable one liner template based solution I’d go for

for (int u = 0; u < unisonVoices; ++u)
   FloatVectorOperations::clip (tgBuffer.getWritePointer (u), tgBuffer.getWritePointer (u),
                                -1.0f, 1.0f, numSamples);

as it is okay to remove the curly brackets for one-liner loop statements.

I know about the curly brackets, they where just added here for readability.

I don’t mind bloating the code a bit. Performance is my primary concern.

As the multiple channels of a buffer are not guaranteed to be one piece of continuous memory, a loop like that has to be performed in any case, so if your primary concern is performance, solutions that hide away the loop won’t change anything here. I think you already found the most performant solution this way

Just to add it here, I had some fun trying to build something generic & fully templated that does not need a lambda to get an operation for all channels call done with a one liner

template <typename ...Args>
struct ForAllChans
{
    template <void(*op)(float*, const float*, Args..., int), typename SampleType>
    static void call (AudioBuffer<SampleType>& buffer, Args ...args)
    {
        auto numChans = buffer.getNumChannels();
        auto numSamps = buffer.getNumSamples();
        auto** bufPtr = buffer.getArrayOfWritePointers();

        for (int c = 0; c < numChans; ++c)
            op (bufPtr[c], const_cast<const SampleType*>(bufPtr[c]), args..., numSamps);
    }
};

Usage:

ForAllChans<float, float>::call<FloatVectorOperations::clip> (someBuffer, -1.0f, 1.0f);

The variadic Args... template parameter denotes the argument types between the pointers and the number of samples that the specific float vector op needs. This makes the line above a bit ugly to read :grimacing: Any C++ expert here knows a way this could be deducted from the static function call?


Edit: I :hearts: Stackoverflow (at least sometimes)
Some clever person over there helped me to get it done better. Solution now:

template <typename T>
struct TypeIdentityHolder { using type = T; };

template <typename T>
using TypeID = typename TypeIdentityHolder<T>::type;

template <typename SampleType, typename... Args>
void forAllChans (void (*op)(SampleType*, const SampleType*, TypeID<Args>..., int), AudioBuffer<SampleType>& buffer, Args... args)
{
    auto numChans = buffer.getNumChannels();
    auto numSamps = buffer.getNumSamples();
    auto** bufPtr = buffer.getArrayOfWritePointers();

    for (int c = 0; c < numChans; ++c)
        op (bufPtr[c], const_cast<const SampleType*>(bufPtr[c]), args..., numSamps);
}

Usage:

forAllChans (FloatVectorOperations::clip, someBuffer, -1.0f, 1.0f);

Thank you very much for your input. But now that I got your attention and you’re mentioning lambda’s, of which i have almost nil experience, is there a way to assign different code to the same lambda (name)? Let me explain what I am thinking.

So I got my voice buffer loop gathering samples from a wavetable, where I then do interpolating (5 options), then duty cycle processing (similar to pulse width adjustment but not just for square waves, and I got 20 interesting way to do it), and finally do advancing the angle (soft (wrap) or hard sync). For each of those three I use switches, but still that is 5 x 20 x 2 = 200 options which has to be dealt with 1024 (buffer size) times up 8 tone generators x up 16 unison voices.

Now what I am thinking is these options should optimally not have to be considered for each of the 1024 samples per buffer, but instead only once right before entering the loop, as there it is known what options the user has set.

Sure right before that buffer loop, I could use those three settings to make only one switch variable, and bloat the code with 200 variations - yikes!

But could I instead set an interpolate lambda to one of the 5 options, another lambda to one of the 20 duty cycle options, and another for the two advance angle options, so now inside the buffer loop all I did was invoke the three lambda’s?

I tried fiddling with functions pointers and function arrays, but it all came out slower, perhaps because the above three mentioned processing steps (interpolate, duty cycle, and advance angle) are all inline and uses parameter pointers.

Regarding your follow-up question:

Yes, creating three std::function objects for the three processing steps and assigning lambdas according the current user choice could be a solution that would at least tidy up your code significantly instead of a big switch case block. However, a std::function encapsulates various types of callable objects, including function pointers. Now if you say

then I wouldn’t expect any speed up with using more heavyweight std::function objects instead of lightweight function pointers. I would even advise against them because assigning a lambda with a capture list to a function pointer allocates and is not really suited for the audio-thread. On the other hand, I’m surprised to hear that you encountered a massive slowdown with that approach. Surre, the compiler won’t be able to heavily inline all parts of your processing anymore, but I wouldn’t expect the impact to be extremely high. So, would you mind describing the structure of your function pointer approach a bit more in detail? Maybe it can be optimised :grinning:

Template would probably be more efficient

template <typename Mode>
struct Processor
{
void process(float *pInOut, int numSamples)
{
Mode mode;
for (int i = 0;  i < numSamples; i++)
{ 
  mode(pInOut, i);
}
};

where Mode is a struct with an () operator

That’s a great way to achieve better code structure for the different modes, however as the mode of choice needs to be dertermined at runtime, the overall processing function would still need something like

renderNextBlock (AudioBuffer<float>& outputBuffer, int startSample, int numSamples)
{
    // Assuming these parameters are fixed for each sample block, query them once at the start 
    // of the rendering process for this block. This could be e.g. int plugin parameters referring
    // to some enum constants...
    const auto processingStep1Mode = getProcessingStep1Mode();
    const auto processingStep2Mode = getProcessingStep2Mode();
    const auto processingStep3Mode = getProcessingStep3Mode();

    auto* ptr = // compute the pointer to the memory you have to write to

    switch (processingStep1Mode)
    {
        case Step1Modes::mode0: Processor<Step1Mode0Implementation>::process (ptr, numSamples); break;
        case Step1Modes::mode1: Processor<Step1Mode1Implementation>::process (ptr, numSamples); break;
        case Step1Modes::mode2: Processor<Step1Mode2Implementation>::process (ptr, numSamples); break;

        // ... and so on
    }

    switch (processingStep2Mode)
    {
        case Step2Modes::mode0: Processor<Step2Mode0Implementation>::process (ptr, numSamples); break;
        case Step2Modes::mode1: Processor<Step2Mode1Implementation>::process (ptr, numSamples); break;
        case Step2Modes::mode2: Processor<Step2Mode2Implementation>::process (ptr, numSamples); break;

        // ... and so on
    }

    // ... the same for the third step
}

You can probably achieve less stuff using a templated factory for each Step and have a cascaded factory that call those in a nested way to create the processor
You will need to use a base class for this processor with a virtual function which your template class inherit which is returned by this factory. But only process on all the samples will be virtual.

template<class Step1, class Step2, class Step3>
void Dispatch(float *pInOut, int count)
 {
 	Step1 step1;
 	Step2 step2;
 	Step3 step3;
 	for (int i = 0; i < count; i++)
 	{
 		pInOut[i] = step1(step2(step3(pInOut[i])));
 	}
 }


template<class Step1, class Step2>
void Dispatch3(float *pInOut, int count, int step3)
    {
      switch(step3)
      {          
        case 1:
          Dispatch<Step1, Step2, Step3_1>(pInOut, count);
          break;
        case 2:
          Dispatch<Step1, Step2, Step3_2>(pInOut, count);
          break;
      }
    }

template<class Step1>
void Dispatch2(float *pInOut, int count, int step2, int step3)
    {
      switch(step2)
      {          
        case 1:
          Dispatch3<Step1, Step2_1>(pInOut, count, step3);
          break;
        case 2:
          Dispatch3<Step1, Step2_2>(pInOut, count, step3);
          break;
      }
    }

void Dispatch1(float *pInOut, int count, int step1, int step2, int step3)
    {
      switch(step1)
      {          
        case 1:
          Dispatch2<Step1_1>(pInOut, count, step2, step3);
          break;
        case 2:
          Dispatch2<Step1_2>(pInOut, count, step2, step3);
          break;
      }
    }

Probably don’t need a base class actually.

1 Like

Pretty neat! If I get you right, the render callback will then look like

renderNextBlock (AudioBuffer<float>& outputBuffer, int startSample, int numSamples)
{
    // Assuming these parameters are fixed for each sample block, query them once at the start 
    // of the rendering process for this block. This could be e.g. int plugin parameters referring
    // to some enum constants...
    const auto processingStep1Mode = getProcessingStep1Mode();
    const auto processingStep2Mode = getProcessingStep2Mode();
    const auto processingStep3Mode = getProcessingStep3Mode();

    auto* ptr = // compute the pointer to the memory you have to write to
    Dispatch1 (ptr, numSamples, processingStep1Mode, processingStep2Mode, processingStep3Mode);

I was part of a discussion on this page; https://stackoverflow.com/questions/60692915/c-incomplete-type-is-not-allowed-trying-to-create-array-of-functions-inside

I there tried two of the solutions, but discarded exact implementation. They both came out 14% slower.

I’ll look into otristan and your suggestions later today.

Yes. And for example

struct Step2_1
{
float operator()(float x) const { return x*x; }
};