ListenerList crashes on AudioProcessor destruction

Although I take your point that the ListenerList previously had no data races if two threads were only calling listeners at the same time. To the best of my knowledge the ListenerList has never been “thread safe”, has never claimed to be thread safe, and I doubt it was ever intended to be thread safe (unless you pass the second template argument with an ArrayType that uses a CriticalSection).

This change might significantly increase the chance of data races but I’m not sure it’s fair to call it a “breaking” change, no contracts were broken, I’m not aware this was ever a guarantee, it just happened to work that way, it’s a classic example of Hyrums law!

I suspect in this case the issue is a contention between the audio thread and the GUI thread. Probably both are trying to perform sets on a ValueTree? Unfortunately I couldn’t confirm this based on the shared logs as the stack from the audio thread “failed to restore”. If this is the case I would strongly recommend avoiding this.

Having a look it might be possible to restore the JUCE 6 behaviour in the current version, but I wouldn’t want to make any promises. That being said all this would do is hide what I believe is a user error. Any reliance on this not crashing is likely masking potentially more subtle issues that could be lurking in user code.

I also suspect if we mimic the JUCE 6 behaviour we will only be reducing the chance of a crash (albeit significantly) as it’s likely listeners are being added/removed when the editor is created/destroyed and the processor is running and trying to call all the listeners. Also imagine the same value on a ValueTree is set from both threads, the order in which the listeners receive the two calls would be indeterministic.

The real fix is for the user to ensure thread safety. I can think of at least three ways to do this…

  1. Wrap all the calls in a mutex to prevent a data race (not a good idea if an audio thread is involved!)
  2. Only use the ValueTree / ListenerList on a single thread, then use a different mechanism to synchronise the data across threads (personally this is the approach I would normally reach for)
  3. Create separate ValueTree’s / ListenerList’s for each thread, so each one is single threaded. This should also make it clear which callbacks are occurring on the audio thread.

Point taken, and from a strictly “code-legal” point of view, your are absolutely right and the change was well within your bounds. I would argue however, that a considerable amount of people used ListenerList in a multi-threaded way anyhow after confirming, that it is save given the current state of the code.
I’m not saying that wasn’t also kind of stupid (the methods aren’t even marked const), but JUCE is pretty old and I think this kind of change should been marked more clearly or separated into a sub-class.

One of the fundamental design guidelines from C++ is “Don’t pay for what you don’t use” and if I know I’m not adding/removing listeners while iterating them, I shouldn’t pay for these checks. As this wasn’t guarded for years and million lines of code don’t relay on it being checked, I think this should be capsulated differently.

I inherit juce::Timer and use juce::timer::startTimerHz(60); and have Timer::stopTimer(); in the deconstructor.

EDIT: the timer itself is just calling repaint(); which in turn calls my Play Markers function.

EDIT2: I just commented out the timer start/stop and still got a shutdown crash.

Reporting the exact same issue pointed by andym-2 here.
In my case the crash happens on closure once every ~50 times and is within this line (line 110 as of juce v7.0.8 on master) of juce_ListenerList.h when trying to remove a listener from a valid object

WrappedIterator::forEach (activeIterators, [&] (auto& iter)

A few points about my implementation:

  • Standalone app
  • Using just the Message thread with timers and listeners
  • Not using HighResolutionTimer but just a normal Timer
  • ListenerList member object declared as: juce::ListenerList<Listener, juce::Array<Listener*, juce::CriticalSection>> _listeners;

Thanks

Could the timer problem be caused by the timer thread creating a timer message, then the destructor gets called, in which we call stopTimer, but since the message already exists in the queue, the listener is still called, but since that doesn’t exist anymore, the call leads to a crash?

The number #1 crash in our crash-logs (that customers supply!) is in the timerCallback. Stack trace is always during destruction of our editor.

I’ve got some internal changes I’m pretty positive they will take some time to make it through an MR though! My main goal is to try and detect and assert for possible issues early allowing us to communicate some possible solutions.

@fravezAt your use case sounds interesting, given you are already passing a criticalSection. I think the destructor of the ListenerList isn’t locking the arrays internal lock so I guess you are destroying the ListenerList while it’s iterating through the listeners, maybe try calling clear() on your listeners list before it is destroyed?

@reFX that’s interesting I’ve not experienced an issue like that before. I’ll see if I can do something to repro that.

@anthony-nicholls I can send you symbolicated crash logs for Mac and Windows. On Mac these crashes are 3x as common as on Mac.

Yes please. Not sure how much it will help but given the prevalence of Timer I think if there is an issue it definitely warrants looking into.

one of our user reported recently an issue with juce::Timer as well

Thread 15 Crashed:: JUCE Timer
0 libsystem_pthread.dylib 0x18651e558 pthread_mutex_lock + 12
1 Falcon 0x15a87018c juce::CriticalSection::enter() const + 12
2 Falcon 0x15a884bf4 juce::MessageQueue::post(juce::MessageManager::MessageBase*) + 28
3 Falcon 0x15a8857c4 juce::Timer::TimerThread::run() + 568
4 Falcon 0x15a85c16c juce::Thread::threadEntryPoint() + 292
5 Falcon 0x15a87c86c juce::Thread::createNativeThread(juce::Thread::Priority)::$_28::__invoke(void*) + 152
6 libsystem_pthread.dylib 0x186523fa8 _pthread_start + 148
7 libsystem_pthread.dylib 0x18651eda0 thread_start + 8

Just an update on my situation, after discussion with Anthony, it looks like my issues were caused by an incorrect assumption that VT’s were thread safe. I was using a processor VT in the GUI to trigger listeners in the GUI from a mouseListener to send a message to an info display component. I’ve now replaced that with a dedicated VT for that purpose, which is obvious in hindsight really. I also now have 2 ValueTrees I use between the threads, each one being single direction like a motorway when writing.

The only other use was for a few .onClick() lambdas to set properties that triggered listeners/file loaders in the processor, but now I have a dedicated Send VT which I use instead.

I have done a bit of testing with the v7 code & it seems to be OK so far, but I’m still using the v6 for the moment until I have more time next week to do more tests.

1 Like

I’ve send them to you in a PM

1 Like

Also, to ensure we’re not confusing stuff here, we do NOT use ANY value trees. None.

Yeah the timer stuff seems independent - don’t want to hijack this thread but it is useful. I feel I understand the ListenerList issues now. In most cases I think it’s because it’s being used in multi-threaded cases and with some relatively minor changes I think most users should be able to prevent this. I think however we can improve diagnostics and there are a couple of very minor edge cases we could deal with better too. I’ll try and get something on develop to help with that as soon as possible.

1 Like

I feel the timer issues are related because internally the timer class also uses a ListenerList and we have crashes in juce::CriticialSection when the timer class itself modifies the listener list.

Isn’t the timer just its own thread that wakes up periodically and sends a message through the MessageManager, which then calls the ListernList callbacks for that timer? What if the Timer thread changes the list while the MessageManager is iterating through it? What if the message is added to the queue, but before the listener list gets iterated, the listener gets destroyed?

To me this feels all connected. Juce internally doesn’t handle these ListernerLists right in the multi-threaded context it itself creates.

So it doesn’t matter if it’s a ValueTree, or Timer, or whatever else’s uses ListenerLists. If Juce doesn’t handle them right internally, then what hope does the user have?

It’s a good thought but unfortunately there isn’t a ListenerList in the Timer, there is one in Thread but this is only used to call exitSignalSent on any listeners and Timer (or any of the related classes) isn’t a Thread::Listener either. Unless I’m missing something?

So I’ve added some extra diagnostics and I plan to run it up on all the examples we have to see if anything triggers but when I had a little look around manually I didn’t spot anything obviously off with the way ListenerList is used internally. It absolutely could be but I haven’t found that to be the case yet.

I have since found three issues with the ListenerList itself…

  1. If clear is called on the ListenerList during a callback then it will still try to iterate through all the listeners. No one seems to have reported this to the best of my knowledge so I’m assuming that’s not a common thing to do, but as we support removing listeners in callbacks I think we should supporting clearing the listeners too.

  2. If a CriticalMutex is passed in and callbacks are firing while the destructor to the ListenerList is called there is still a potential data race. I think in most cases it’s best practice to signal that you want callbacks to stop and wait until it’s safe to destroy the object, however I think there are likely legitimate use cases where that will be difficult and non-obvious so protecting against this makes sense if we can.

  3. If a CriticalMutex is passed in and one of the callbacks results in the destructor of the ListenerList itself, although the iterator will immediately bail preventing access to the listener array (that’s the point of the code in the destructor) it will then call the destructor of the scoped lock which will try to unlock the mutex which has been deleted. Although this is easy enough to reproduce I’m finding that it doesn’t crash for me and address sanitiser doesn’t pick it up! which is probably why it was missed.

I have fixes internally for all the above but they’ll need more testing and some team review.

I think most of the issues with ListenerList are just where it’s used in multiple threads and I think we can do better to diagnose and communicate how ListenerList should be used as I think it is relatively easy to do accidentally and not immediately obvious when it goes wrong (as is so often the case with threading issues). I have a proof of concept for this so hopefully that can make it in too.

1   com.refx.nexus                    0x000000034858fd04 0x348000000 + 5831940 juce::CriticalSection::enter() const (in Nexus) + 12
2   com.refx.nexus                    0x000000034863bc20 0x348000000 + 6536224 juce::ReferenceCountedArray<juce::MessageManager::MessageBase, juce::CriticalSection>::add(juce::MessageManager::MessageBase*) (in Nexus) + 28
3   com.refx.nexus                    0x0000000348637970 0x348000000 + 6519152 juce::Timer::TimerThread::run() (in Nexus) + 576
4   com.refx.nexus                    0x0000000348564bcc 0x348000000 + 5655500 juce::Thread::threadEntryPoint() (in Nexus) + 284
5   com.refx.nexus                    0x00000003485d3a88 0x348000000 + 6109832 juce::Thread::createNativeThread(juce::Thread::Priority)::$_25::__invoke(void*) (in Nexus) + 32
6   libsystem_pthread.dylib           0x000000019736c26c _pthread_start + 148
7   libsystem_pthread.dylib           0x000000019736708c thread_start + 8

So, there is a reference counted array inside the Timer. Isn’t that a list of Timers or their Listeners?

How can that thread crash immediately upon its creation? That’s literally at the start when the timer gets created/started—way outside our control.

We don’t do any of those things. We don’t add, remove, or clear any listeners in any callbacks. Ever. We only add them in our constructors and remove them in our destructors.

For the Timer issue it’s literally just a call to startTimeHz ( 60 ); in the constructor, and in the destructor we call stopTimer (). Nothing else.

Crashes during Timer creation/start in CriticialSection, which tries to modify some ReferenceCountedArray or during shutdown, when listeners are called during destruction, which may then points to objects which don’t exist anymore.

Happens way more frequently on Mac than on Windows. The CriticalSection issue occurs only on Mac.

It might well be but as far as I could tell it’s not literally using the ListerList class (maybe it should!), I haven’t looked properly yet.

I’m not saying you do, I was only referring to the literal use of the ListenerList class.

I think this just highlights that the Timer issue is a separate issue, even if the problem is very similar. The point is “fixing” (including any extra diagnostics I add) ListenerList isn’t likely to fix the Timer so it warrants a separate investigation that I’ll likely dive into almost immediately.

1 Like

Personally I think a lot of the JUCE codebase that rely on on the message thread for correctness (ValueTree, ListenerList, etc) should assert as such. I think that’s already done in a lot of the Component methods anyway.

Alternatively you can store the thread ID during construction and assert if its not that thread in any other call. That will allow to run intentional operations in a side thread but not use multi threaded calls.

I wouldn’t say they rely on the message thread, they are just not thread safe without a mutex.

Yep that’s pretty much exactly what the extra diagnostics will do, that way we can jassert early.

1 Like