Problems with ListenerList deadlocking (and some proposed solutions)

I use a lot of ListenerLists, with many classes defining their own customized listener classes. These listeners tend to act across threads, so I lock the ListenerLists by defining a CriticalSection:

    template <typename T>
    using LockedListenerList = juce::ListenerList<T, juce::Array<T*, juce::CriticalSection>>;

Recently though, I’ve been running into problems with deadlocks. It’s difficult to track down, but I think what is happening is that two threads are calling different listener callback simultaneously, and one of them triggers a new listener to be added to the other. Since both ListenerLists have CriticalSections defined, both threads open two locks, but they call the locks in different orders, hence the deadlock.

Doing some research, I have come across the advice again and again, don’t call virtual functions while locking. Now I can see why!

This isn’t the kind of post where sharing code will be all that helpful (though I can share if you like). Instead, I’m looking for general strategies that I can use to solve the problem. Here are some ideas I’ve come up with:

  1. The normal advice is to use std::scoped_lock to synchronize all of the mutexes (and to ditch the CriticalSection). But this doesn’t seem so easy in my case, as the locks are being declared in completely different areas of the code. Right now I don’t know how to synchronize them, although perhaps it is possible.

  2. As someone here is bound to say, I could rewrite my code so that Listeners aren’t being called from multiple threads at all. Perhaps this is good advice, but it’s easier said than done. I’m quite heavily invested in Listeners at this stage, and I started using them specifically to simplify my code. Divesting from them would mean a major code refactor, and I’d really rather avoid that at this stage.

  3. I could write my own ListenerList class so that the lock isn’t in scope while the callback is being called. I think this would be possible, although it would probably create other difficulties with object lifetime.

  4. I could erase all of my individual listener classes and instead have one central listener which serves all classes. This would function a bit like a message-board for the whole system, and would ensure that all callbacks are synchronized. This would seem to make things a lot simpler, although it also violates a few design principles (i.e. the code would be less decoupled).

Can anyone share any advice? I’m not attached to any of these solutions yet, and would love to hear what other people have come up with.

I don’t think you’d need a massive refactor for this, I bet you could simply call the listeners in an async callback, like so:

void someEventHappened()
{
    juce::MessageManager::getInstance()->callAsync ([this] () {
        m_listeners.call ([] (Listener& l) { l.someEventCallback(); });
    });
}

It would have the downside that listeners won’t be informed of the event instantly and other actions may occur in the meantime that might break some stuff.

In general though, if the API provides a way to use a custom array type, and that array provides a way to use a custom locking mechanism, then those things should work! Maybe you could help out the JUCE team by writing a unit test that always fails in the way you’re describing?

In general I’m trying to move things off the message thread and onto separate worker threads. Nevertheless, there might be something to what you’re saying. Maybe I could create another thread which deals with all callbacks?

Here is a minimum example of the deadlock that I’m facing. There is no need to fix this code: I know it’s contrived and easy to fix, but I was just trying to get a mock-up of the problem.

class TestListener
{
public:
    virtual void foo() {}
    virtual void bar() {}
};

using LockedListenerList = juce::ListenerList<TestListener, juce::Array<TestListener*, juce::CriticalSection>>;

class ListenerBroadcaster : public TestListener
{
public:
    LockedListenerList listeners;

    void foo() override { DBG("foo"); listeners.call([](TestListener& l) { l.bar(); }); }
    void bar() override { DBG("bar"); }

private:
};


class ListenerJob : public ThreadPoolJob
{
public:
    ListenerJob(ListenerBroadcaster& a_, ListenerBroadcaster& b_) : a(a_), b(b_), ThreadPoolJob("l") {}

    JobStatus runJob() override
    {
        a.listeners.add(&b);
        a.listeners.call([](TestListener& l) { l.foo(); });
        a.listeners.remove(&b);

        return JobStatus::jobNeedsRunningAgain;
    }

private:
    ListenerBroadcaster &a, &b;
};



class DeadlockDemo
{
public:
    
    DeadlockDemo()
    {
        tp.addJob(new ListenerJob(a, b), true);
        tp.addJob(new ListenerJob(b, a), true);
    }

private:
    ThreadPool tp;
    ListenerBroadcaster a, b;
};

Maybe the best solution would be to use one mutex for all different listener classes. This would mean abandoning the built-in mutex in the ListenerList, but it would be easy enough to lock each class on its own.

Part of your design inherently brings so much complexity that I have to ask ‘why’ you’re approaching it this way? Why not pass off a message or flag asynchronously from the listener callback, and sync on the separate thread - wherever that is - via some unified mechanism?

It almost sounds like you need signals and slots from QT… Even then, say your intention is to pass POD objects around or pass triggers around - you could investigate Vinnie’s CallQueue.

“Why is this so complicated?” is a question that I ask myself every day. I’ve tried so many times to simplify and reconceive the whole thing, but complexity and bugs always creep back in.

Partly I think that the task itself is just really difficult, but sometimes I also think that I’ve chosen the wrong tools for the job. I’m wondering whether I should look at different approaches, such as functional programming and persistent data structures. But it takes a long time to learn something new, and I’m so comfortable at this point with what I know that it would feel like a leap of faith.

There’s nothing like an existential crisis as motivator to solve daily issues!

3 Likes

I finally managed to solve this problem. The callback, instead of performing the needed task, now launches a new ThreadPoolJob. This takes the load off the message thread, and also lets the lock around the ListenerList release.

Lesson learned: don’t use multi-threaded callbacks to call extensive code. Use them to trigger a flag, or a create a new ThreadPoolJob. This is in line with Core Guideline CP22: Never call unknown code while holding a lock.