Timur Doumler Talks on C++ Audio (Sharing data across threads)

matkatmusic · February 8, 2018, 9:36am

So, i’ve been watching his talks about how to do safe object exchange without blocking and being lock-free. His end solution is something like this:

class Synth
{
public:
    //this is the user of the widget
    void audioCallback(float* buffer, int bufferSize)
    {
        std::shared_ptr<T> widgetToUse = std::atomic_load(&currentWidget); 
        //use widget with your buffer
    }
    
    //this is the creator of the widget
    void updateWidget(/*args */)
    {
        std::shared_ptr<T> newWidget = std::make_shared<T>(/* args */);
        releasePool.add(newWidget);
        std::atomic_store( &currentWidget, newWidget );
    }
    
    std::shared_ptr<T> currentWidget;
    ReleasePool releasePool;
};

Now, he claims that it’s lock-free, but upon looking at how std::atomic_load(shared_ptr) works, we see there are locks:

template <class _Tp>
_LIBCPP_AVAILABILITY_ATOMIC_SHARED_PTR
shared_ptr<_Tp>
atomic_load(const shared_ptr<_Tp>* __p)
{
    __sp_mut& __m = __get_sp_mut(__p);
    __m.lock();
    shared_ptr<_Tp> __q = *__p;
    __m.unlock();
    return __q;
}

So, is there a better way to accomplish what he’s talking about in the videos? Do those locks in the atomic_load not really matter because of how shared_ptr<T>::operator=(const shared_ptr<T>& other) works?

martinrobinson-2 · February 8, 2018, 10:29am

Yeah I got bitten by this and saw hotspots in that mutex when profiling. Haven’t got a solution yet

anima · February 8, 2018, 10:37am

It uses locks when it can’t load atomically i.e. what you’re trying to load is too big.

martinrobinson-2 · February 8, 2018, 11:06am

…which std::shared_ptr is. I have shared pointer implementation somewhere that uses a DCAS (double width compare and swap). But I need to tidy it up. I’ll probably need to do this anyway for this use case at some point.

Essentially, it stores the pointer in half of the double word and the strong and weak counts in the other half (half each again of that word, so 16 bits for each count on 64 bit). Then it can swap the whole thing atomically and lock-free. Obviously this naive implementation assumes that the CPU supports DCAS operations. I recall that iOS 64 bit was a problem since it didn’t have a DCAS. But there are quite a few unused bits in the pointer on iOS 64 so these can be used for some of the reference-counting bits, but of course you get a more limited range for the counts (as far as I remember this is what Apple does for ARC on iOS too). But for this use case you can be careful to keep your reference counts low in any case.

dave96 · February 8, 2018, 12:11pm

Yes this is a pain until we get std::atomic<std::shared_ptr> in C++20…
(See 6 & 7 here http://en.cppreference.com/w/cpp/atomic/atomic)

PixelPacker · February 8, 2018, 12:52pm

IMHO the root of this misunderstanding is the fact, that atomic and lock-free are not the same thing. std::atomic may be lockfree, but it’s not always. That’s what std::atomic::is_lock_free() is for. So if you’re trying to be clever and stick std::functions into a lock-free queue of atomic shared pointers do yourself a favour and use a static assert to check if the atomic is actually lock free - or even better, have a look at the assembly.

matkatmusic · February 8, 2018, 7:16pm

it would be great to get an official solution and explanation. the talks that advocate ring buffers say “make your buffer huge” to solve the problem, but is that still solving it or just hoping that reading and writing never collide. Some of the videos mentioned not being able to mathematically prove a race condition will never happen and that making the ring buffer size large is still a gamble. Is there a true one-size-fits-all solution for sharing data across threads? What is the technique?

matkatmusic · March 5, 2019, 12:52am

Has an official solution and explanation ever been developed?

Or is this the one thing that the juce team decided “we helped them enough via the framework, they can figure this one out on their own”

Achder · March 5, 2019, 8:46am

I’m also very interested in a general solution. Especially since I didn’t know about this thread and the lock inside atomic_load().

matkatmusic · March 5, 2019, 6:34pm

per this thread:

is this a valid solution:

template<typename T>
struct SmartAtomicPtr
{
    SmartAtomicPtr( T* newT )
    {
        update( newT );
    }
    ~SmartAtomicPtr()
    {
        update(nullptr);
    }
    void update( T* newT, std::memory_order ord = memory_order_seq_cst ) 
    {
        keepAlive.reset( atomicTptr.exchange( newT, ord ) );
    }
    std::shared_ptr<T> getShared(std::memory_order ord = memory_order_seq_cst) 
    { 
       return std::make_shared<T>( atomicTptr.load(ord) );
    }
    T* getRaw(std::memory_order ord = memory_order_seq_cst) 
    { 
         return atomicTptr.load(ord);
    }
private:
    std::atomic<T*> atomicTptr{nullptr};
    std::shared_ptr<T> keepAlive;
};

basteln · March 6, 2019, 8:25am

is this a valid solution:

It is “valid” in the sense that there is no data race when reading/writing the pointer.
BUT if you call get() and update() from different threads (e.g. audio thread and message/main thread), the pointer returned from get() may point to an already- destroyed instance. With this interleaving:

/* audio thread: */ auto* p = ptr->get();
/* message thread: */ update(new …);
/* audio thread: */ p->doSomething(); // p is dangling; undefined behavior

Even if the the instance is alive at the time p is assigned, there’s nothing that keeps it alive on the audio thread. So you can’t use p safely without introducing more synchronization.

[EDIT] Also (as pointed out by another SO user) the constructor calls delete on an uninitialised pointer.

matkatmusic · March 6, 2019, 8:57am

Thanks @basteln I updated the stackOverflow link, and updated above. So, I guess some kind of SafePointer needs to be used? Thanks for explaining in such a simple way what the problem is and why it’s such a difficult problem to solve!

[EDIT]
edited the solution above. thoughts?

basteln · March 6, 2019, 12:56pm

No problem

edited the solution above. thoughts?

make_shared<T> is used for creating a new instance of T. The intent of getShared seems to be to create a shared_ptr that keeps alive what atomicTptr points to. To do that, you would write return std::shared_ptr<T>( atomicTptr.load(ord) );. But that introduces new problems:

/* audio thread: */ atomicTptr.load(ord)
/* message thread: */ keepAlive.reset( atomicTptr.exchange( newT, ord ) );
/* audio thread: */ std::shared_ptr<T>(/* what was returned above is already dangling */)

That problem aside, multiple calls to getShared create multiple shared_ptrs pointing to the same memory, but being unaware of each other. Each would have its own reference count, and the same memory would be deleted multiple times.

From my experience with lock-free code, whenever you try to solve a problem A by introducing a new piece of data, you usually introduce new problems B (C, …) because now more data has to be synchronized. Of course there are exceptions, but this is a good example of this effect. The more general problem is that the raw atomic<T*> only holds the pointer, but we need to synchronize more data (the reference count) along with it.

If I had a go-to best-practice pattern to solve this, I would post it; or there would probably be something in JUCE. I’ve spent quite a lot of time with atomic shared_ptr, and more and more I found that:

You can avoid calling atomic_load/store too often, by keeping a shared_ptr per thread, along with a status flag. The idea is to only call atomic_load when the data was actually changed. So the realtime thread locks, but the lock is “much less likely” to be occupied by the main thread. If “much less likely” doesn’t sound satisfying, I agree
Reconsider whether you really need the shared_ptr, or can get away with a lock-free queue of “plain” data (i.e. trivial POD structs, not object graphs). For example, you might be able to circumvent dynamic polymorphism by using std::variant instead. But it depends. And unless you’re using a custom allocator, using shared_ptr forces the instance to be on the heap, whereas the lock-free queue approach doesn’t; which may affect performance.

dave96 · March 6, 2019, 1:05pm

This is a bad example (as pointed out) but the common solution is just to increment the reference count whilst the audio thread is using it. In this case it’s as simple as:

/* audio thread: */ auto p = ptr;
/* message thread: */ update(new …);
/* audio thread: */ p->doSomething(); // p is now local and still valid

However, you do have the problem now that p can be deleted on the audio thread which can block for an indeterminate amount of time (as it requires a global lock on the heap).
One solution to this is to dispatch the delete the to the message thread via some kind of garbage collection but you have to be very careful about lifetimes in these situations.

As @basteln points out, the best method is to try and avoid having to synchronise in the first place.

matkatmusic · March 6, 2019, 1:13pm

that was @timur’s solution in his talk, actually.
the problem I found was that atomic_load(shared_ptr) has locks, so his solution wasn’t actually lock free.

grrrrr why isn’t there a generic solution yet?!?!

dave96 · March 6, 2019, 1:19pm

The problem is that std::shared_ptr has a strong count, weak count and pointer. That makes 96 bits. At the moment, >64 bit CAS operations aren’t widely supported (you’d need to swap the whole thing atomically), hence the locks.

There could be a way to do it if you implemented std::shared_ptr with short strong and weak counts but that’s not really portable.

If you used juce::ReferenceCountedObject you could do it but then you don’t get the weak count (and it’s intrusive).

However, I’m interested to find out more about scheduling to know if a thread will ever likely to be suspended whilst inside atomic_load, atomic_exchange etc. as there are only a few instructions.
My fear is that although unlikely it’s probably not completely deterministic.

matkatmusic · March 6, 2019, 5:56pm

Anyone seen this?

dave96 · March 6, 2019, 5:58pm

That gets away with it because it doesn’t have the weak count so the CAS operation is smaller.

matkatmusic · March 6, 2019, 6:00pm

What’s the point of the weak count, anyway? I’m not fully understanding why shared_ptr has 2 use counters internally…

dave96 · March 6, 2019, 6:04pm

It keeps a weak reference count which is managed through a std::weak_ptr which is basically observing but non-owning pointer type, similar to juce::WeakReference.

They’re useful for avoiding lifetime issues and in caches etc. In the std world, in order to dereference a std::weak_ptr you need to call lock() which will create a std::shared_ptr for you so the object doesn’t get deleted whilst you’re calling in to it.

Topic		Replies	Views
Discuss: CircularSingleWriterSingleReaderLockFreePointerFIFO General JUCE discussion	31	1328	November 12, 2012
Thread safe reference counted message passing CODE! General JUCE discussion	18	1968	April 12, 2012
Concurrency, meters, JUCE classes Audio Plugins	15	4129	July 6, 2016
AsyncUpdater which don't malloc General JUCE discussion	75	3452	October 22, 2010
Dropping samples? General JUCE discussion	65	2963	February 19, 2012

Timur Doumler Talks on C++ Audio (Sharing data across threads)

Purchase

Discover

Learn

Support

About

Events

Timur Doumler Talks on C++ Audio (Sharing data across threads)

Related topics

Purchase

Discover

Learn

Support

About

Events