Timur Doumler Talks on C++ Audio (Sharing data across threads)

This thread is fascinating :slight_smile: Has any of this changed in the last years? @timur @dave96

I was planning on using a strategy like the one proposed by Timur in his 2015 talk (and the one that started this thread). I’m working on a sequencer (no audio rendering) and I have the whole state in a ValueTree (I’m following Dave’s techniques on VT-based apps). Then, every time a note is added to the VT (or a track or anything else), I was planning on building a new “parallel” data structure which includes information about all tracks/clips/notes and pass a pointer to this data structure to the audio thread so it can read from it and send MIDI messages without interacting with the VT. Then I’d use a release pool to delete old unused objects. If I understood correctly, that is the idea suggested by Timur. So is it OK nowadays to use that strategy or should I do something else?

I guess the alternative is to use a lock-free FIFO to send messages from the message thread to the audio thread and “construct” that parallel data structure directly in the audio thread, but then I need to be sure to not allocate in the audio thread and therefore I should use pre-defined numbers of maximum tracks/clips/notes so I do all the pre-allocation once in the main thread. The messages sent to the audio thread would include information about what things have changed (updated notes, etc). It could use the VT listener for changes in the main thread and then turn these somehow into messages to be sent to the audio thread so the audio thread can apply changes to its data structure (which can’t be another VT because I understood VTs can’t be written in the audio thread).

The project I’m working on is open source so if I manage to do a proper implementation of this it can be used as an example for future devs with similar questions :slight_smile:

construct reference-counted objects on the GUI thread (or background thread) and add them to a release pool, then use a FIFO to pass them to the audio thread where they get used. the audio thread needs a member variable of the type being passed around, which will get updated.
This pattern prevents allocations/deallocations on the audio thread, while still allowing the exchange of objects between the gui and audio thread.

struct PluginProcessor : AudioProcessor
{
    DataObject::ptr dataObj = new DataObject();
    Fifo<DataObject::Ptr, 50> dataObjectFifo;
     
    void processBlock(buffer) 
    { 
         DataObject::ptr t; //nullptr by default
         while( dataObjectFifo.pull(t) ) { ; }
         
         if( t != nullptr ) 
              dataObject = t;

         dataObject->process(buffer); //or whatever processing you need done
    } 
};
struct GUI : Editor 
{
    GUI(PluginProcessor& proc) : processor(proc) { ... }
    ReleasePool<DataObject::Ptr> releasePool;
    PluginProcessor& processor;

    void mouseUp(...) //perhaps mouseUp events is where you create new DataObjects
    {
        DataObject::Ptr obj = new DataObject( ... );
        releasePool.add(obj);  //keeps it alive
        processor.dataObjectFifo.push(obj);
    }
}

the DataObject might look like this:

struct DataObject : juce::ReferenceCountedObject
{
    using Ptr = juce::ReferenceCountedObjectPtr<DataObject>;

    std::array<float, 100> arbitraryData;
    float value1 = 0.f;
    int value2 = 0;
    double value3 = 0.0;

    juce::AudioBuffer<float> buffer; 
};

No allocation/deallocation happens on the audio thread.
Usage is the only thing that happens.

I discuss this idea further here:

2 Likes

Could you further clarify how ReleasePool looks like? Otherwise, awesome small example!

1 Like

There’s a lot of different subjects being raised in this thread, but if I read it correctly, the original question seemed to be about how to get the audio thread to send a message to tell the GUI to update things that have changed.

If that’s what you’re trying to do, then this choc class might come in handy:

it’s a lock-free “dirty list” which manages a (potentially very large) set of objects, and can efficiently mark them as needing attention by a background thread.

4 Likes

thanks @matkatmusic and @jules for the quick answers. indeed there have been several topics raised in this thread. I think the first question was rather generic, but then discussion got wild.

my use case in particular is that of having some state in a ValueTree containing tracks/clips/notes and having to share it with the audio thread to interpret it and put midi messages into a midi out buffer. I think @matkatmusic’s answer gives a minimal example of a technique I could use for doing that, so I’m planing to give it a go (thanks!). Could the DirtyList proposed by @jules be also used for that purpose? (it looks like it might be thought for exchanging messages in the opposite direction?)

In a more recent talk of mine, A lock-free atomic shared_ptr, I mentioned that the original strategy I proposed in 2015 is unworkable because atomic operations on std::shared_ptr are not lock-free.

For publishing an object from the GUI thread to the audio thread in a lock-free fashion, there is actually a whole other programming pattern which I was unaware of at the time which is a much better solution than the refcounted release pool stuff. The pattern is called RCU (Read Copy Update). I am planning to do a talk on RCU at some point soon, in the meantime I encourage you to look it up yourself.

RCU was originally invented for the Linux kernel, and it’s still the context in which it is most often discussed. But RCU can also be adapted to user-space scenarios like the stuff we audio people are doing, and then imho it’s a much better solution to the problem than anything I’ve used before.

If you’re trying to do the reverse direction (audio thread notifying GUI thread of changes), then you should think in the direction of @jules 's dirty list pattern or a lock-free FIFO.

And you can even combine two approaches to get both directions, I think this is basically what farbot RealtimeObject is doing.

6 Likes

Do you know/recommend a valid (fully tested) C++ RCU implementation in the user space?
IIRC RCU and Hazard Pointers are on the road to next C++ (C++26 ?) standard.
But nowaday?

:+1:

what I do is serialize parameter values onto the FIFO, then deserialize them on the audio thread. i.e. no mutable object is shared.
This has the advantage of not needing reference counting, locks, shared pointers, or atomics (except safely hidden within the FIFO implementation).
It also has the advantage of being seamlessly extended to work across process boundaries, between the DAW and your ‘GPU Audio’, over a network, or as an interop layer between C++ and some other language (e.g. if your GUI is written in javascript or something).
The main advantage, which is harder to quantify - is the lack of headaches. The GUI runs exclusively in one thread, the audio runs in another. All the race-conditions evaporate, all the hard-to-debug concurrency weirdness goes away.
Granted, some coders will resist this because you don’t get so many opportunities to write ‘clever’ code.

3 Likes

Thanks @timur and @JeffMcClintock for the latest answers!
RCU looks interesting but I’ll wait for @timur’s talk before diving into it as I’m not adventurous enough to go into that alone.

@JeffMcClintock what you suggest I think is also the other option I was proposing, that of keeping a parallel data structure with the app state in the audio thread and synchronize it using messages passed over a fifo. That’s actually what I do for the GUI because it runs entirely in another process (in javascript) so I could consider that as well for the audio thread. I use ValueTree change listeners to trigger sending messages. One question with that strategy however is where to find a suitable FIFO implementation (lock free etc). Can you point me to one? A simple example would be awesome :slight_smile:

Nevertheless, if I still want to try using the “passing pointers approach”, there’s a fundamental difference between original @timur’s idea (first post in this thread), and the one described 7 posts above by @matkatmusic because there’s a fifo used to pass the pointers (if I understand correctly). Do you think this is still a suitable strategy @timur? (also for @matkatmusic, what FIFO implementation are you using for that? any pointers?)

Thanks everyone a lot!!

I’m just using the standard juce::AbstractFifo-based one, which is very easy to write.

2 Likes

Hey @matkatmusic I am trying to understand your very interesting reference counted objects + FIFO + release pool strategy and I have a question.

Why does your dataObjectFifo have a capacity of 50? It seems like in the audio thread you’re only ever interested in the most current object. Can’t you have a FIFO of size 1 then? And instead of using a FIFO where push fails when the FIFO is full, you can use one that just keeps overwriting old data? At which point I don’t really understand why you need a FIFO at all?

Actually, what happens in your code if dataObjectFifo is full when the GUI thread calls processor.dataObjectFifo.push(obj)? Does it fail? What do you do then? How did you choose the size 50 in the first place?

Also, is it possible to see the implementation of the Fifo class template somewhere?

1 Like

With a FIFO, ‘full’ means all memory pointed to in this FIFO is potentially being read by the other thread right now. The rule is, you therefore can’t write anything new into the FIFO (nor to memory pointed-to by entries in the FIFO) until the other thread has advanced it’s ‘read pointer’ to indicate that some free slots have opened up.
You must NEVER EVER write to (non atomic) data on one thread that is being read from by another because you will experience ‘tearing’ whereby you end up with half the old data and half the new data.

2 Likes

I agree, but I think there are two terms interleaved here.

In my experience, with GUI->Audio communication what you want is essentially 3 chunks of memory:

  1. The current data read by the processor
  2. The pending data that’s about to be read by the processor if a change happened
  3. The data currently written to by the GUI.

You can certainly have #1 and #2 in a FIFO-style container for storage, but the logic isn’t to push and pull by order, but to use some logic to atomically index the read indexes of the processor after a write, making sure the currently read or about-to-be-read processor data is never touched.

All of this is assuming that what you want in the processor is just the ‘latest’ data from the GUI, which I would say is the most common model, and not a full stream of data being pushed in order from another thread which is a more specialized use case.

Hey @timur

Fifo size was arbitrarily decided out of habit, as I am always using the FIFO going in the other direction (Audio → GUI)

I showed the Fifo implementation earlier in this thread:

I never considered what happens when the push fails.
My code snippet was a hypothetical approach.

My usual use-case for the Ref-Counted Objects + Fifo + Release Pool involve this strategy:

  • background thread creates the RefCountedObjectPtr’s
  • background thread adds them to the releasePool, incrementing the reference count.
  • RCOP’s are added to Fifo, for consumption on the audio thread.
  • RCOP’s refCount is always 2+ when the audio thread pulls an instance from the fifo.

a short code example:

struct Processor : juce::AudioProcessor
{
    DataObject::Ptr data;
    ReleasePool<DataObject::Ptr> releasePool;

    BackgroundObjectCreator backgroundObjectCreator { releasePool };
    Processor() 
    {
        data = new DataObject();
        releasePool.add(data); //bumps up reference count to 2. 
    }

    void setStateInformation(...)
    {
        //restore the APVTS..  then 
        backgroundObjectCreator.request( tree.getChildWithName("dataObjectProperties") );
    }

    void processBlock(...) 
    {
         DataObject::ptr t; //nullptr by default
         while( backgroundObjectCreator.pull(t) ) { ; }
         
         if( t != nullptr ) 
              data = t; //decrements reference count of 'data' to 1.

         data->process(buffer);
    }

};

My guess is the read indexes of the 3 chunks would naturally advance like {0, 1, 2, 0 , 1, 2} which is very much like a 3-slot FIFO.

1 Like

Yes, exactly, and that’s pretty much how I implemented it for my own code.

I just wanted to put some focus on the fact that it’s not a “FIFO” in the traditional sense where one side keeps pushing in order, and the other side keeps pulling in order. There’s some use case specific logic here on top of the FIFO to ensure the (single) processor’s data isn’t touched.

I call this triple buffering, by analogy with the graphics use case.

2 Likes

I would like some feedback on this design, which i got from the CPPLang slack workspace:

template<class T>
struct ValueSender
{
    void prepare(int numSamples, int numChannels)
    {
        if constexpr (std::is_same<T, juce::AudioBuffer<float>>::value )
        {
            for( auto& buf : buffer )
                buf.setSize(numChannels, numSamples);
        }
    }
                          
    void push(const T& f)
    {
        buffer[static_cast<size_t>(writer.index)] = f; 
        writer.fresh = true;

        writer = reserve.exchange(writer); //switches 'writer' with whatever was in 'reserved'
    }
    bool pull(T& t)
    {
        reader.fresh = false; //{0,false}
        reader = reserve.exchange(reader);  //switches 'reader' with whatever was in 'reserve'

        if (!reader.fresh)
        {
            return false;
        }

        t = buffer[static_cast<size_t>(reader.index)];
        return true;
    }
private:
    std::array<T, 3> buffer;
    struct DataIndex
    {
        int index;
        bool fresh = false;
    };

    DataIndex reader = {0};
    DataIndex writer = {1};
    /*
     the reserve always atomically holds the last reader or writer.
     */
    std::atomic<DataIndex> reserve { DataIndex{2} };
};

The storage for ‘DataIndex’ is more than 4-bytes (on most platforms), so on a lot of platforms, it’s NOT atomic on its own. So the compiler is going to insert a bunch of mutexes to protect it. I am assuming this is meant to be lock-free. It isn’t.

There is a ton of errors in this code.

“ton of errors” is a fairly aggressive phrase. Thread Sanitizer has not had any breakpoints triggered when I use this class on my Intel Mac.

Which system are you referring to with regard to the size of DataIndex?

Also (responding via phone), I’m wondering if you could use 3 DataIndex members, and store a pointer to one in the atomic instead, and then cache locally the pointed-to DataIndex in the push & pull functions, and use the cached pointer…

DataIndex read {0}, write {1}, extra {2};
std::atomic<DataIndex*> reserve { &extra };