MessageManagerLock deadlocks, I ain't holding it


#1

I reorganized my app, which was working fine, and now I’m deadlocking on MessageManagerLock (under OS/X).

I’ve done the usual debugging, putting print statements as I take each lock and release it, and setting a breakpoint on the MessageManagerLock.

It seems to be that I successfully get and almost immediately release the lock one time, on the thread that’s running JuceApplication::main.

However, I then try to take the lock on another thread I started, and I immediately lock - and of course then my other threads soon grind to a halt.

What could I be doing to prevent MessageManagerLocks from being taken?


#2

Can’t think of anything new that might have affected this… The locks definitely work, otherwise all my apps would also fail.

Isn’t it clear from the stack trace which two threads are deadlocked? The most common scenario is that your message thread is calling stopThread() on the thread that’s trying to create a MM lock…


#3

i would generally recommend not to use MMLocks or callFunctionOnMessageThread() on a thread (unsafe by design), better choose a design which allows the thread to run without interruption(ASyncUpdater, CallbackMessage)

void myclass::run()
{
  triggerAsyncUpdate();
  while (!threadShouldExit())
  {
  	if (asyncUpdateHasProceeded)
  	{
  		break;
  	};
  	Thread::sleep(100);
  }
}	

void myclass::handleAsyncUpdate()
{
    
    // doSomeThing on MessageThread
     asyncUpdateHasProceeded=true
}


myclass::~myclass()
{
	cancelPendingUpdate();

        // be sure that all pending updates are processed (this in not a perfect solution when the destructor is called from the message-thread)

         while (allPendingUpdatesProceeded)
   {  
	Thread::sleep(100);
   }



}

#4

Oh, I wasn’t thinking that this was something new in Juce. I’m baffled as to what I could have done to cause this, however.

I never call stopThread at any time in my code - instead, my loops all check threadShouldExit() rather obsessively and terminate if that becomes true. I generally feel that that’s the only correct way to exit a thread, in any language.

The stack trace only seems to show ONE thread locked. Yes, I know that seems implausible. I am quite sure, however, that only one of MY threads is locked waiting for this. Is there any other way to lock the MessageManager other than creating an instance of MessageManagerLock?

As I mentioned, I set breakpoints in MessageManagerLock’s constructors. I reach a breakpoint once, take and release the thread. The next time I hit that constructor, the code spins forever, waiting clearly for something to happen in some other thread that isn’t happening.

chkn: thanks for the code sample!

As for rewriting the whole app entirely, well, I need to deliver final product in a few weeks and I’ve been at this for many months. I had no issues until this redesign, which unfortunately was somewhat too large for me to isolate which specific part might have triggered it.

But I think I will start putting things to derive from AsyncUpdater, now I know how risky doing anything component-oriented is on any other thread. In fact, I’m now believing that EVERY use of MessageManagerLock is a potential problem, am I right?

I’m a little worried by that code sample, actually, because I can’t see how it works correctly. You’re checking asyncUpdateHasProceeded repeatedly - and yet you never take a lock. If some other thread changes asyncUpdateHasProceeded, then you aren’t guaranteed to see that change until you take some sort of mutex.

Also, why use Thread::sleep instead of Thread::wait? I always use wait, so that I can then notify the thread and avoid all those nasty 100ms loops that result in sluggish UI response - or, does notify work on sleeping threads too? I am realizing that I’m not sure of the difference between Thread::sleep() and Thread::notify().

Thanks as always for the support. I’ll report back when I hit a solution so the “next guy” will be able to search it up.


#5

Totally agree with you here, this is how I’ve been doing it for 13 years. Either a check to threadShouldExit() in a tight loop or time consuming function, or sending the thread some type of message telling it to unblock and return from its entry function (in the case where a thread waits on some queue for work).


#6

Sounds very odd, Tom - one thread deadlocking is a bit like one hand clapping…

Chkn - don’t forget that the MM lock can take a thread in its constructor, which I think is probably a neater way to achieve what you’re trying to do?

As soon as I’ve got a new release out (very soon, probably this week if I don’t find any showstoppers), then a Big New Feature that I want to add will be some actor model/CSP type threading classes - i.e. queue/channel objects that you can use to communicate between threads, which will let you write code that avoids a lot of this kind of nastiness.


#7

This sounds very exciting, Jules! Though it might be a race between your new code, and my deadline. :smiley:

My theory is that I’m not going to debug this but instead put everything into AsyncUpdates.

However, I found something else interesting when looking through my threads - it seems as if juce::CoreAudioIODeviceType::scanForDevices (which I start quite early in my job) is hanging (or perhaps I’m misinterpreting the stack trace…?), because whenever I look at my main thread, it looks like this:

#0 0x956fe0fa in mach_msg_trap
#1 0x956fe867 in mach_msg
#2 0x956fee79 in mach_port_allocate
#3 0x928e03e4 in IONotificationPortCreate
#4 0x967b0413 in HALPlugInManagement::Initialize
#5 0x967b03b5 in HALSystem::InitializeDevices
#6 0x967abe6f in HALSystem::CheckOutInstance
#7 0x967d3702 in AudioObjectGetPropertyDataSize
#8 0x004b065b in juce::CoreAudioIODeviceType::scanForDevices at juce_amalgamated.cpp:278401
#9 0x002fcb3c in juce::AudioDeviceManager::scanDevicesIfNeeded at juce_amalgamated.cpp:25684
#10 0x002fcd1f in juce::AudioDeviceManager::getAvailableDeviceTypes at juce_amalgamated.cpp:25493
#11 0x00319560 in juce::AudioDeviceSelectorComponent::AudioDeviceSelectorComponent at juce_amalgamated.cpp:73534
#12 0x000b563c in rec::slow::AudioSetupPage::AudioSetupPage at AudioSetupPage.h:16
#13 0x000b576d in rec::slow::MainPageComponent::MainPageComponent at MainPageComponent.h:20
#14 0x000b471f in rec::slow::RecWindow::RecWindow at RecWindow.cpp:11
#15 0x001372ec in rec::slow::Application::initialise at RecApplication.h:20
#16 0x0034d9cf in juce::JUCEApplication::initialiseApp at juce_amalgamated.cpp:19147
#17 0x0034dad9 in juce::JUCEApplication::main at juce_amalgamated.cpp:19198
#18 0x0034dd1a in juce::JUCEApplication::main at juce_amalgamated.cpp:19238
#19 0x00136e57 in main at Main.cpp:3

Is this expected - or is it a symptom of some problem (or some OTHER problem)?

As I said, I think I’m going to async everything, and I’ll let you know.


#8

Hmm, looks like an audio device driver that’s going nuts to me.


#9

To follow up on this, putting all my GUI updating code into an AsyncUpdater made the issue go away - even though I was unable to find out who the other entity holding the lock was.

To warm the next person doing this, I introduced a bug while making this change by storing a Something* pointer instead of a set<Something*>… forgetting that I might request an update multiple times before it goes off. Easily found and fixed though.

(The audio device stuck thread might still be there, I haven’t looked at that…)