Lock-free shared_ptrs?

Say you’re building your new awesome audio app and you keep your realtime rendering thread lock-free like you’re supposed to. Atomics generally work really well for the simple stuff such as the volume, as “simple” data types as floats, ints, bool and the like have lock-free atomicity on normal hardware. Things start getting complicated once larger data structures come into play, such as biquad coefficients (the example used in https://www.youtube.com/watch?v=PoZAo2Vikbo) or audio samples. A possible solution for this is using pointers, which can be swapped atomically. This way you can read your drum sample from disk and construct the data structure on a low-priority thread and once you’re done, swap the pointer on your real-time thread.

However these raw pointers kind of go against the RAII-principle we all subscribe, as they require some juggling with memory to make sure we don’t leak any. Shared_ptrs seem like a solution in these kind of situations, but they generally don’t have lock-free atomicity as they basically consist of a pointer and a counter. As far as I’ve understood, this will still be the case with C++20 new shared_ptr atomics. As such I’m thinking of a solution along the following line:

Realtime shared_ptr
This would be more or less a normal shared_ptr, allowing you to share ownership among multiple threads. In addition to this, there would be a realtime thread ownership. Non-realtime threads would act normally (e.g. the last holding a reference will run the destructor), but they would enter a spinlock while a realtime thread still holds a reference. This way all construction and destruction can be done on low-priority threads, while you’ll never destruct data the realtime thread is using. At worst, the destructor has to wait until the realtime thread is finished, which is supposed to be fast anyway.

This seems like quite a simple/clean solution. Is there an implementation like this available, or would it be possible to implement something like this? Or is there a problem with this idea I’m overlooking?

It’s an interesting idea but I don’t think it fully addresses the problem.
In general, destroying objects isn’t really an issue as it’s deterministic so you can just make sure you have a reference that will always be the last one and destroyed on the message thread, like a garbage pool.

The real problem with std::shared_ptr is that it locks when assigning to a std::shared_ptr as there’s no other way to synchronise the ref counts. There isn’t really a way around this with a non-intrusive shared pointer as you can’t atomically change the ref counts and the pointer that the shared_ptr points to (i.e the control block).

1 Like

Does JUCE offer any helper classes for this garbage pool behaviour (or these kind of constructions in general)? Or do you have any links were such implementations are explained? My current application relies on passing around resources using shared_ptrs and architecturally this makes a lot of sense in my case. The shared_ptrs have not caused any problem so far, so the only reason to change them would be their non-lockfree atomicity.

But I feel as if you could get around that by splitting the ref-counts into realtime and non-realtime (“managing”) accesses. The locking behaviour would only occur in the “managing” threads (were it wouldn’t matter as much), while the realtime threads wouldn’t be affected. The main issue would be if a RT thread tries to access a resource just as the latest “managing” reference gets out of scope. You would only need to make sure that either: 1) the RT thread gets the old resource (and you’ll halt the managing thread from destroying it for a few milliseconds) or 2) it gets the new resource. This would definitely be hard (as all atomics seem to be), but not completely impossible, even without double-width CAS?

I might be missing the point but I can’t see how adding another ref-count in the control block can help? it doesn’t matter what threads do what, the problem is if you assign to a std::shared_ptr (or use atomic_store/load etc.) you need to atomically change the pointer in the std::shared_ptr and then decrement the ref count in the control block.

There is no way to modify two non-contiguous memory addresses (the pointer to the control block and the ref-count in the control block) with an atomic instruction, hence the locks used in the STL.

It kind sounds like what you’re proposing is an extra layer of indirection to the shared_ptr where the real-time thread can atomically “reserve” access to it. This is one of the patterns me and Fabian discussed in the talk you linked to. But I think I’d need to see some code to understand what you’re proposing fully as it really comes down the memory layouts of your “ref-counts”.

There’s not really anything in juce to build a collection-pool except maybe a Timer for periodically checking the pool. All you really need is a std::vector<std::shared_ptr<void>> though and just periodically check if any of the ref-counts is 1 (so the pool is the only one owning it) and then remove it so it gets deleted.

You need to make sure you don’t use the std::weak_ptr class with this kind of pool though in case a real-time thread resurrects a strong count after you’ve removed it from your pool.

Yes, an extra layer of indirection sounds like a pretty good summary of what I’m after! Like you mentioned in the talk, this kind of atomics get unwieldy fast (several times I went “that’s actually pretty clever”, only for you to continue with “however…”) and the risk of breaking anything (silently!) with a refactor is very real. As such, this should be abstracted away and I thought an interface like shared_ptr with separated RT and non-RT access seemed like a pretty natural way of doing so.

I’ve tried to make an example below. In my application it’s not actually a Sound, but I think it illustrates the use case nicely. Each Sound can be created, cloned and modified by several objects and as such needs a shared_ptr. The SoundRenderer holds a single sound which is rendered at each frame and this Sound can be changed at any moment by calling setCurrentSound(). This can be done by all of the aforementioned objects.

struct Sound {
    std::atomic<float> someAtomicParameters[16];
    // Some more non-atomic data, say a sample from a file

class SoundRenderer {
    void setCurrentSound(std::shared_ptr<Sound> newSound) {
        currentSound = newSound;

    // This is the realtime thread
    void processAudio (float* buffer) {
        // This doesn't work as setCurrentSound could be called during processAudio,
        // which in turn could destroy the currentSound while we're rendering
        // Sound* tempSound = currentSound.get();

        // This does work, but is not strictly lock-free
        std::shared_ptr<Sound> tempSound = currentSound;

        // Something like this is what I'm hoping for:
        // currentSound would then obviously some "special pointer"
        // RealTimePointer<Sound> tempSound = currentSound.acquireRealTimeAccess();

        // Do the actual processing on tempSound

    std::shared_ptr<Sound> currentSound;

I think you’re over complicating it. What you really want is just a real-time spin lock around your currentSound and make sure you only ever call tryLock on the real-time thread. If this fails, it means your non-real-time thread is currently assigning a new sound and you’ll have to render without the sound.

@timur wrote about this in great detail recently with an almost identical use case: https://timur.audio/using-locks-in-real-time-audio-processing-safely

The only other thing I would say is to try and avoid getting in to this case in the first place.
In general, if you can make your render graph rebuild when something like a sound file changes you never end up in this situation as you just have a root node to process that you can swap atomically.

1 Like