Attempting to regulate blocksize for an algorithm

Hi friends,

I’ve figured out how to guard against hosts sending too big of blocksizes – just break the I/O buffer up into smaller chunks & call your wrapped processing function for each chunk in sequence.

But the next thing I want to learn to program defensively against is hosts sending blocksizes that are too small

Here’s what I’ve got so far. This function is nested inside an outer one that checks for buffers too big, so I know that buffers fed into this function will never exceed my declared internalBlocksize. My goal with this function is to ensure that my renderBlock() method is only ever called with block sizes exactly equal to my internalBlocksize.

Does this seem like a good approach? Or am I doing something wrong here?

Where numStoredInputSamples & numStoredOutputSamples are integer members initialized to zero; inputCollectionBuffer, outputCollectionBuffer, inputTransferBuffer & outputInterimBuffer are AudioBuffer members with a size of internalBlocksize * 2 samples:

template<typename SampleType>
void ImogenEngine<SampleType>::processWrapped (AudioBuffer<SampleType>& input, AudioBuffer<SampleType>& output)
{
    const int numNewSamples = input.getNumSamples();
    
    jassert (numNewSamples <= internalBlocksize);
    
    const int totalNumInputSamplesAvailable = numStoredInputSamples + numNewSamples;
    
    if (totalNumInputSamplesAvailable < internalBlocksize)
    {
        inputCollectionBuffer.copyFrom (0, numStoredInputSamples, input, 0, 0, numNewSamples);
        numStoredInputSamples = totalNumInputSamplesAvailable;
        
        if (numStoredOutputSamples < numNewSamples)
        {
            output.clear();
            return;
        }
        
        for (int chan = 0; chan < 2; ++chan)
            output.copyFrom (chan, 0, outputCollectionBuffer, chan, 0, numNewSamples);
        
        usedOutputSamples (numNewSamples);
        
        return;
    }
    
    AudioBuffer<SampleType> thisChunksOutput (outputCollectionBuffer.getArrayOfWritePointers(), 2, numStoredOutputSamples, internalBlocksize);
    
    if (numStoredInputSamples == 0)
    {
        renderBlock (input, thisChunksOutput);
        
        for (int chan = 0; chan < 2; ++chan)
            output.copyFrom (chan, 0, outputCollectionBuffer, chan, 0, numNewSamples);
        
        usedOutputSamples (numNewSamples);
        
        return;
    }
    
    inputCollectionBuffer.copyFrom (0, numStoredInputSamples, input, 0, 0, numNewSamples);
    
    AudioBuffer<SampleType> inputCollectionBufferProxy (inputCollectionBuffer.getArrayOfWritePointers(), 1, 0, internalBlocksize);
    
    renderBlock (inputCollectionBufferProxy, thisChunksOutput);
    
    for (int chan = 0; chan < 2; ++chan)
        output.copyFrom (chan, 0, outputCollectionBuffer, chan, 0, numNewSamples);
    
    usedOutputSamples (numNewSamples);
    
    numStoredInputSamples = totalNumInputSamplesAvailable - internalBlocksize; 
    
    if (numStoredInputSamples == 0)
        return;

    inputTransferBuffer  .copyFrom (0, 0, inputCollectionBuffer, 0, internalBlocksize, numStoredInputSamples);
    inputCollectionBuffer.copyFrom (0, 0, inputTransferBuffer, 0, 0, numStoredInputSamples);
};

and here is the usedOutputSamples() function:

template<typename SampleType>
void ImogenEngine<SampleType>::usedOutputSamples (const int numSamples)
{
    numStoredOutputSamples -= numSamples;
    
    if (numStoredOutputSamples > 0)
    {
        for (int chan = 0; chan < 2; ++chan)
        {
            outputInterimBuffer   .copyFrom (chan, 0, outputCollectionBuffer, chan, numSamples, numStoredOutputSamples);
            outputCollectionBuffer.copyFrom (chan, 0, outputInterimBuffer, chan, 0, numStoredOutputSamples);
        }
    }
    else
        numStoredOutputSamples = 0;
};

and I should mention that the last line of my renderBlock() function is this:

numStoredOutputSamples += internalBlocksize;

There’s probably a more elegant way to do this that hasn’t occurred to me yet… ¯_(ツ)_/¯

and if I’m understanding correctly, then I should report the internalBlocksize as the latency in samples of this algorithm…?

I think it’s totally the wrong approach.

You should process however many samples the host calls you with. If the host calls you to process 1 sample doing anything else would cause an audible audio glitch.

There are also countless other bugs that could happen as a result - for example, if the host called you with a smaller buffer size it’s usually because it wanted you to get more accurate automation values and song position/tempo changes, which you will miss by trying to connect the buffers.

Also, why are you trying to avoid large buffer sizes? While that’s technically possible to split them, unless a massive buffer size would require you to allocate gigabytes of memory, there’s absolutely no reason why you shouldn’t be processing the largest buffer the host can throw at you.

Instead, just make sure you’re allocating the maximum amount of memory needed in prepareToPlay, and write your algorithm correctly so it works with any block size up to that maximum size.

Latency has nothing to do with block size.

If you need to introduce latency, it’s usually because your algorithm needs X amounts of data before it can start processing. If you need to delay the signal by, say, 100 samples so you can process correctly you should be able to implement that with both 1 sample buffers and 10,000 sample buffers.

I slightly disagree here – there can be a lot of benefit to breaking a large host block into smaller blocks internally if you have a graph with modulation – unless you want to code all of your DSP to take “blocks” of modulated parameter input instead of single parameter values.

That said – I didn’t take a deep look in the code, but there’s no way to prevent a host from calling a block size too small, you must always give them their requested samples.

You’re absolutely right on that one, there are indeed advantages to split the buffer size if your algorithm needs more accuracy. In fact I do that quite a lot.

But - since OP has already posted a few messages on Discord on the topic, I’m suspecting this isn’t the issue in question.

So it’s not really possible to have an algorithm that always internally processes consistent block sizes?

This is basically my issue. Checking for “too large” block sizes from the hosts is certainly more arguable as to whether or not it’s necessary, but for doing things like pitch detection & pitch shifting, it’s not really feasible to analyze or do processing on a chunk of samples shorter than a minimum required length…

So it’s not really possible to have an algorithm that always internally processes consistent block sizes?

Well, if you write a standalone application you can definitely assume a constant buffer size, but not as a plugin in a host.

TBH I think dynamic buffer sizes are a classic domain problem and you should start thinking of your algorithms that way, as it will make your life much easier.

Yes, that’s what I suspected.
To do that, you need to introduce delay (latency).

If you want to think about this problem correctly, imagine you will always get called with buffer sizes of 1 sample.

In that case you will have to manage your processor state so that you will hold on to information until it’s time to ‘spit out’ the correct calculation.

Now if the buffer size is larger one, you do exactly the same, and just add a counter to make sure you send out the processed information in a constant interval.

1 Like

That’s what this code is attempting to do…

I made two member buffers, one called inputCollection and one called outputCollection.

Any time new samples come in, they are appended to the end of the inputCollection buffer.

If the inputCollection buffer contains enough samples to do processing, then a wrapped processing function is called with those samples, then those samples are removed from the inputCollection buffer and the rest of that buffer’s contents shifted up to the beginning. The rendered output samples are appended to the end of the outputCollection buffer.

So, any time the host sends me a block, the internal processing may or may not be triggered, but even if not, theoretically I should be able to send the host the most recent samples from my outputCollection buffer…

Is this conceptually the right way to go about this?

That sounds like the general concept.

But - you should do that at the sample level and not the block level.

Imagine if your algorithm needs a minimum of 100 samples to do the pitch detection process. The first 100 samples in any block size would send out 0’s because the algorithm is still pushing things into the input collection.

But - after that moment each sample should send out the processed sound for the previous 100 samples. So sample #100 will now send the processed version of sample #0, sample #101 will send the processed version of sample #1, etc…

All a ‘block’ means is that you need to do the process above X times according to the block size.

2 Likes

Yeah, that sounds like what I’m trying to do in my code.

Maybe I didn’t implement it quite right, but that’s what I’m going for conceptually.

This is what I mean by “internal blocksize”. If the host sends me blocks of 1, my plugin as it is now should output silence until a total of 100 samples have been recieved, then will output the processed samples for those first 100 input samples with the host’s next 100 calls for output samples.

Am I misunderstanding this? I’m trying to count the total # of input samples available to process (a combined total of any samples stored in the inputCollection buffer previously & not used yet, + the new # of samples sent in the host’s new block), and if that total # of samples is greater than or equal to my minimum required # of samples, then processing/analysis is triggered. Is that wrong…?

Sounds to me like you might be kinda right, but only when it comes to the very first few samples of processing.
After your first ‘latency block’ has ended and you finished pushing out the initial 0’s, you will have to process out every single sample. Which is I think why I’m so confused by your phrasing.

After that first early latency block is finished, you will now have enough information to process every sample that comes in until the end of time. It’s just that your logic needs to know to read from the right part of the delay buffer you pre-recorded, but no additional ‘waiting’ needs to happen.

You could still, for example, decide to only trigger your analysis (FFT?) process every X samples, but logically you need to stream out the samples constantly.

1 Like

I think this is true for the way my code currently works…

Lemme test it with a jassert real quick

this is basically what I’m trying to implement. Only do analysis (which is pitch detection in this case) every X number of samples.

Thank you for trying to help me!

I often do this to enforce a strict flexibility for switching between block or sample based processing in my DSP algorithms.

Unless I need to do vector optimizations all my DSP classes inherit from a base which has two methods:

processBlock(float* inLeft, float* inRight, int inNumSamples);
&
processSingleSample(float& inLeft, float& inRight)

in most cases unless there is good reason, process block is simply:

processBlock(float* inLeft, float* inRight, int inNumSamples) {
    for (int i = 0; i < inNumSamples; i++) {
        processSingleSample(inLeft[i], inRight[i]);
    }
}

essentially – like Eyal Is saying, you’ll be in a much easier place to switch between block or single sample or whatever is best for your specific use case, if you manage things at the sample level, and writing your DSP in this way to get started may help you simplify things in your mind as well.

Typically I find it a lot easier to get things working on a single sample basis. It makes it really clear where you’re going to need to utilize memory and which algorithms will need ring buffers etc. you also can remove all the mess of loops managing buffers etc from your actual DSP guts.

Good luck with it!

2 Likes

here’s what my function looks like now:

template<typename SampleType>
void ImogenEngine<SampleType>::processWrapped (AudioBuffer<SampleType>& input, AudioBuffer<SampleType>& output)
{
    // at this level, the buffer block sizes sent to us are garunteed to NEVER exceed the declared internalBlocksize, but they may still be SMALLER than this blocksize 
    
    const int numNewSamples = inBus.getNumSamples();
    
    jassert (numNewSamples <= internalBlocksize);
    
    // copy new input samples recieved this frame into the back of the inputCollectionBuffer queue
    inputCollectionBuffer.copyFrom (0, numStoredInputSamples, input, 0, 0, numNewSamples);
    numStoredInputSamples += numNewSamples;
    
    if (numStoredInputSamples < internalBlocksize) // not calling renderBlock() this time, not enough samples to process!
    {
        if (numStoredOutputSamples == 0) // this should only trigger during the first latency period of all time!
        {
            jassert (firstLatencyPeriod); 
            output.clear();
            return;
        }
    }
    else // render the new chunk of internalBlocksize samples
    {
        firstLatencyPeriod = false;

        // alias buffer referring to just the chunk of the inputCollectionBuffer that we'll be reading from
        AudioBuffer<SampleType> thisChunksInput  (inputCollectionBuffer.getArrayOfWritePointers(), 1, 0, internalBlocksize);
        
        // alias buffer referring to just the chunk of the outputCollectionBuffer that we'll be writing to
        AudioBuffer<SampleType> thisChunksOutput (outputCollectionBuffer.getArrayOfWritePointers(), 2, numStoredOutputSamples, internalBlocksize);
        
        // appends the next rendered block of samples to the end of the outputCollectionBuffer
        renderBlock (thisChunksInput, thisChunksOutput, midiMessages);
        
        numStoredOutputSamples += internalBlocksize;
        numStoredInputSamples  -= internalBlocksize;
        
        if (numStoredInputSamples > 0)
        {
            // move left-over input samples to the front of the inputCollectionBuffer
            for (int chan = 0; chan < 2; ++chan)
            {
                inputInterimBuffer   .copyFrom (chan, 0, inputCollectionBuffer, chan, internalBlocksize, numStoredInputSamples);
                inputCollectionBuffer.copyFrom (chan, 0, inputInterimBuffer,    chan, 0,                 numStoredInputSamples);
            }
        }
    }
    
    jassert (numNewSamples <= numStoredOutputSamples);
    
    for (int chan = 0; chan < 2; ++chan)
        output.copyFrom (chan, 0, outputCollectionBuffer, chan, 0, numNewSamples);
    
    numStoredOutputSamples -= numNewSamples;
    
    if (numStoredOutputSamples > 0)
    {
        // move left-over output samples to the front of the outputCollectionBuffer
        for (int chan = 0; chan < 2; ++chan)
        {
            outputInterimBuffer   .copyFrom (chan, 0, outputCollectionBuffer, chan, numNewSamples, numStoredOutputSamples);
            outputCollectionBuffer.copyFrom (chan, 0, outputInterimBuffer,    chan, 0,             numStoredOutputSamples);
        }
    }
};

I don’t get any assertion failures with the jasserts where I’ve put them, but I’ve yet to test this extensively with real audio…

just to see if I’m understanding you correctly: are you basically saying to take a similar approach to the code I just posted above, except to break it down further so that no matter what the blocksize from the host, in my internal code it’s always broken down to operations that say “recieve 1 sample, ouput 1 sample”

so like with my setup, it would always be, take the 1 sample and add it to the end of the input buffer

if the input buffer contains enough samples to analyze/process, then trigger that function & write those output samples to a buffer somewhere

output the next sample in line from the output buffer’s queue

??

unless I’m misunderstanding anything, it seems like this is basically the same approach as what I’ve implemented above, but only doing copy operations on single samples at a time, instead of chunks of multiple samples in between blocks large enough to trigger processing.

Or do you mean a different approach that I am not understanding?

Sorry if I seem obtuse, I really would like to understand this fundamental DSP concept, and thank you both for trying to help me :crazy_face:

No that’s basically the gist,

If you stick it in a class and you break to single sample I think you’ll find everything gets easier to reason. I.e. firstLatencyPeriod is a strange thing to track.

I’m sure if you separate the concerns of managing block sizes and your algorithm, you’ll find you’ll end up writing your code such that if the data isn’t ready it’s just returning zeros naturally because you’re output buffer is full of zeros and not yet initialized or trigger to play etc

2 Likes

yeah, I get what you mean. Once I’m just looking at my actual algorithm that’s just a simple straight shot from input to output, where I now have a consistent blocksize thanks to this wrapping code, everything seems very simple actually :slight_smile:

Sometimes there is no way around a fixed block size, for example if your algorithm depends on an FFT.

In these cases, I have two fifos, and input and output fifo. I prime the output fifo with the internal block size. Every call to process block, I push to the input fifo. If the input fifo has enough data, then I run the algorithm and push the the output fifo.

I have an audio/midi fifo here: tracktion_engine/tracktion_AudioUtilities.h at master · Tracktion/tracktion_engine · GitHub

Then my process block looks kinda like this:

        inputFifo.writeAudioAndMidi (buffer, midi);
        midi.clear();

        while (inputFifo.getNumSamplesAvailable() >= parameters.blockSize)
        {
            MidiBuffer scratchMidi;
            AudioScratchBuffer scratch (buffer.getNumChannels(), parameters.blockSize);

            inputFifo.readAudioAndMidi (scratch.buffer, scratchMidi);

            deviceType->processBlock (scratch.buffer);

            outputFifo.writeAudioAndMidi (scratch.buffer, scratchMidi);
        }

        outputFifo.readAudioAndMidi (buffer, midi);
2 Likes

Interesting!

In this setup, is your plugin’s internal processing able to preserve Midi timestamps if the original buffer is processed in smaller segments internally?

I love to use AudioBuffer’s special alias constructor to easily allow me to refer to specific segments of other buffers – ie for dealing with sample offsets, only a subset of the samples present, etc.

I wish there was a similar functionality available for MidiBuffer… Specifically for the sample offset feature.

The goal of this is to be able to feed processWrapped with buffers that, even if they’re later chunks of the original top-level buffer, either way the buffers that processWrapped recieves will start at sample 0. This is easy enough with audio, but dealing with sample offsets in MidiBuffers seems to be a bit more of a task…

Here’s what I’ve got:

while (samplesLeft > 0)
    {
        const int chunkNumSamples = std::min (internalBlocksize, samplesLeft);
        
        AudioBuffer<SampleType> inBusProxy  (inBus.getArrayOfWritePointers(),  inBus.getNumChannels(), startSample, chunkNumSamples);
        AudioBuffer<SampleType> outputProxy (output.getArrayOfWritePointers(), 2,                      startSample, chunkNumSamples);
        
        // put just the midi messages for this time segment into midiChoppingBuffer
        // the harmonizer's midi output will be returned by being copied to this same region of the midiChoppingBuffer.
        midiChoppingBuffer.clear(); // midiChoppingBuffer is a member MidiBuffer in this class
        copyRangeOfMidiBuffer (midiMessages, midiChoppingBuffer, startSample, 0, chunkNumSamples);
        
        processWrapped (inBusProxy, outputProxy, midiChoppingBuffer); // my internal processing function should recieve both audio & midi buffers that start at sample index 0
        
        // copy the harmonizer's midi output (the beginning chunk of midiChoppingBuffer) back to midiMessages
        copyRangeOfMidiBuffer (midiChoppingBuffer, midiMessages, 0, startSample, chunkNumSamples);
        
        startSample += chunkNumSamples;
        samplesLeft -= chunkNumSamples;
    }

and here’s the little midiBuffer chopping function I wrote:

copyRangeOfMidiBuffer (const MidiBuffer& inputBuffer, MidiBuffer& outputBuffer,
                                                      const int startSampleOfInput,
                                                      const int startSampleOfOutput,
                                                      const int numSamples)
{
    outputBuffer.clear (startSampleOfOutput, numSamples);
    
    auto midiIterator = inputBuffer.findNextSamplePosition(startSampleOfInput);
    
    if (midiIterator == inputBuffer.cend())
        return;
    
    auto midiEnd = ++(inputBuffer.findNextSamplePosition(startSampleOfInput + numSamples - 1));
    
    if (midiIterator == midiEnd)
        return;
    
    if (midiEnd == ++(inputBuffer.cend()))
        midiEnd = inputBuffer.cend();
    
    const int sampleOffset = startSampleOfOutput - startSampleOfInput;
    
    std::for_each (midiIterator, midiEnd,
                   [&] (const MidiMessageMetadata& meta)
                       { outputBuffer.addEvent (meta.getMessage(),
                                                meta.samplePosition + sampleOffset); } );
};