Ring buffer with multi-writer and latency offset in write

Hi to all,

I’m trying to create/find a ring/circular buffer to use in an application I’m developing.
In my application I have a part that behave like a mixer in a DAW that have audio tracks processed and summed together. The sum of those tracks are saved in a bus that it’s then red by the callback of the audio interface or by another track as input.

My initial idea was to have a ring buffer inside the bus and have the various processor write in those buffer, summing the data in a particular point.
To be able to delay some track (and compensate the latency introduced by the processors) I was thinking to have a way to calculate the position to write in the buffer based on the current write index position and the delay in sample.

But I’m really struggling to find a library that have a ring buffer with that ability to write in various position of the buffer, and are thread-safe for multi-producer/single-consumer.
I’m finding lock-free queues (like the AbstractFifo in juce), that cannot be used in the way I’m thinking, as I can only append a value to the queue, but not sum the values in a variable position of the queue.
Anyone have something like that?
I’ve tries searching in the forum, but haven’t fine something like that.

Or any suggestion to design the system in a different way that can achieve the same result? (Have multiple track with different delay summed together in a buffer that it’s constantly red by the audio interface)
Any possibility to have a solution that permit to write and read from different threads?

I know that’s really hard… but I can see that implemented in some ways in many daw and I would like to understand how.

Hope someone can help me
Thanks to all
Cheers
Daniele

This is really, really complicated.

I would probably try to solve this with dependency graphs. It would allow you to determine which processors depend on each other, and then execute them, in groups, serially (with the option of multi-threading later). This would require multiple buffers that need filling/summing at multiple points, I would implement these as ‘pseudo-processors’ that are part of the graph.

My naive approach to latency compensation would be a ‘worst-case’ buffer based on the maximum latency compensation required but this is probably a bad idea for many reasons!

1 Like

What would you think about something like this?

/*should be at least as great as number of delay samples
* here set to an abitrary size of 1s @ samplerate 44100
*/
constexpr int bufferSize = 44100;	
int bufferWritePos = 0;
HeapBlock<float> ringBuffer(bufferSize, true);

class Track
{
public:
	/*
	.
	.
	.
	*/
	int getWritePos(int currentReadPos)
	{
		auto writePos = currentReadPos + delaySamples;
		if (writePos >= bufferSize)
			writePos = 0;
	}

       /*
      reads the next sample from trackSamples
      */
	float getNextSample();

private:
	int delaySamples;

	//possibly maintaned and filled from a bufferingAudio source or similar
	AudioBuffer<float> trackSamples;
};

std::vector<Track *> tracks;

/*
your audio interface calls this to fill it's buffer with samples
*/
float getNextSummedSampleFromTracks()
{
	for (auto track : tracks)
	{
		auto sample = track->getNextSample();
		auto writePos = track->getWritePos(bufferWritePos);
		ringBuffer[writePos] += sample;
	}

	auto summedSample = ringBuffer[bufferWritePos];
	ringBuffer[bufferWritePos] = 0.0f;
	bufferWritePos++;

	return summedSample;
}

It’s not the most efficent way to read sample by sample, better would be to read/write a buffer full of samples for each call, but hopefully it will give you some useful input to the design of your system.

Is it not possible to use independent buffers?

Thanks for the answer @oli1 :smiley:

As you was suggesting I absolutely need an audio graph, and in fact I’m implementing a graph of nodes that represent the single processing unit, and all those nodes have their own AudioBuffer where I save their output. The graph keep the connections between the nodes and I can traverse that to create the dependencies list.

What I was trying to achieve was to group the nodes in a way that every chain of nodes, that represent a track, end with a node that write to the ring buffer of the output bus.
In this way I can separate and parallelize the execution of the process of every track’s chain and make that independent.

If I have to follow what is usually done in an audio graph I need to have a summing node at the end of the graph, that have to wait that all the chains have finished to make the sum. Also I need to have some sort of delay line (created with a ring buffer) for every track/chain, using more memory and copying the data once more.

In my idea, I was trying to use a single ring buffer for every output bus and play with the write index position, based on the track delay… but I’m starting to think that it’s not possible in a lock-free way with multiple writer, without causing a race condition :frowning:

Do you confirm that? Or there are some strange technics I can use to achieve something similar?

@oxxyyd Thanks :slight_smile:
Yes, I was thinking to something like that, but instead of a single place that sum all the tracks in the ring buffer, that every track write there own data to the ring buffer itself in different thread, without causing race conditions.

@Marcusonic Uhm, I need to sum the signal at the end. I can use single buffers during the other operations, but at the end I need to sum what are coming out from the single tracks.

Thanks again to all :smiley:

You’re probably going to require extra summing stages other than just the final mix stage.

You can’t perform operations on the same memory from multiple threads, you can split a buffer between the threads but you can’t perform any summing operations.

I think what you’re trying to do is probably going to be orders of magnitude slower than just mixing them in a final stage. Even if each stage/pass has its own buffer we’re still talking less than 1Mb total for 100 processors, 2 channels, and 1024 samples per block.

You will eventually be able to split your graph into stages and execute them in parallel.

As for delay compensation, you would render that as normal (after its inputs have been rendered) and then you would mix that at an offset in the main buffer (or the next dependant).

Sorry, I haven’t thank you for the suggestions @oli1 :slight_smile: