Timur Doumler Talks on C++ Audio (Sharing data across threads)


#1

So, i’ve been watching his talks about how to do safe object exchange without blocking and being lock-free. His end solution is something like this:

class Synth
{
public:
    //this is the user of the widget
    void audioCallback(float* buffer, int bufferSize)
    {
        std::shared_ptr<T> widgetToUse = std::atomic_load(&currentWidget); 
        //use widget with your buffer
    }
    
    //this is the creator of the widget
    void updateWidget(/*args */)
    {
        std::shared_ptr<T> newWidget = std::make_shared<T>(/* args */);
        releasePool.add(newWidget);
        std::atomic_store( &currentWidget, newWidget );
    }
    
    std::shared_ptr<T> currentWidget;
    ReleasePool releasePool;
};

Now, he claims that it’s lock-free, but upon looking at how std::atomic_load(shared_ptr) works, we see there are locks:

template <class _Tp>
_LIBCPP_AVAILABILITY_ATOMIC_SHARED_PTR
shared_ptr<_Tp>
atomic_load(const shared_ptr<_Tp>* __p)
{
    __sp_mut& __m = __get_sp_mut(__p);
    __m.lock();
    shared_ptr<_Tp> __q = *__p;
    __m.unlock();
    return __q;
}

So, is there a better way to accomplish what he’s talking about in the videos? Do those locks in the atomic_load not really matter because of how shared_ptr<T>::operator=(const shared_ptr<T>& other) works?


AudioThreadGuard - keep your audio thread clean
#2

Yeah I got bitten by this and saw hotspots in that mutex when profiling. Haven’t got a solution yet


#3

It uses locks when it can’t load atomically i.e. what you’re trying to load is too big.


#4

…which std::shared_ptr is. I have shared pointer implementation somewhere that uses a DCAS (double width compare and swap). But I need to tidy it up. I’ll probably need to do this anyway for this use case at some point.

Essentially, it stores the pointer in half of the double word and the strong and weak counts in the other half (half each again of that word, so 16 bits for each count on 64 bit). Then it can swap the whole thing atomically and lock-free. Obviously this naive implementation assumes that the CPU supports DCAS operations. I recall that iOS 64 bit was a problem since it didn’t have a DCAS. But there are quite a few unused bits in the pointer on iOS 64 so these can be used for some of the reference-counting bits, but of course you get a more limited range for the counts (as far as I remember this is what Apple does for ARC on iOS too). But for this use case you can be careful to keep your reference counts low in any case.


#5

Yes this is a pain until we get std::atomic<std::shared_ptr> in C++20…
(See 6 & 7 here http://en.cppreference.com/w/cpp/atomic/atomic)


#6

IMHO the root of this misunderstanding is the fact, that atomic and lock-free are not the same thing. std::atomic may be lockfree, but it’s not always. That’s what std::atomic::is_lock_free() is for. So if you’re trying to be clever and stick std::functions into a lock-free queue of atomic shared pointers do yourself a favour and use a static assert to check if the atomic is actually lock free - or even better, have a look at the assembly.


#7

it would be great to get an official solution and explanation. the talks that advocate ring buffers say “make your buffer huge” to solve the problem, but is that still solving it or just hoping that reading and writing never collide. Some of the videos mentioned not being able to mathematically prove a race condition will never happen and that making the ring buffer size large is still a gamble. Is there a true one-size-fits-all solution for sharing data across threads? What is the technique?