JUCE Mac OpenGL deadlock: during attachment or removal of a monitor

This happens fairly regularly.

Main thread –

    1831 Thread_34097   DispatchQueue_1: com.apple.main-thread  (serial)
+ 1831 start  (in libdyld.dylib) + 1  [0x7fff6f40acc9]
+   1831 juce::JUCEApplicationBase::main(int, char const**)  (in Loopcloud) + 83  [0x10173fe53]
+     1831 juce::JUCEApplicationBase::main()  (in Loopcloud) + 144  [0x10173ff00]
+       1831 -[NSApplication run]  (in AppKit) + 658  [0x7fff327311ee]
+         1831 -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:]  (in AppKit) + 1352  [0x7fff3273f4e0]
+           1831 _DPSNextEvent  (in AppKit) + 883  [0x7fff32740c99]
+             1831 _BlockUntilNextEventMatchingListInModeWithFilter  (in HIToolbox) + 64  [0x7fff340f5579]
+               1831 ReceiveNextEventCommon  (in HIToolbox) + 584  [0x7fff340f57d5]
+                 1831 RunCurrentEventLoopInMode  (in HIToolbox) + 292  [0x7fff340f5abd]
+                   1831 CFRunLoopRunSpecific  (in CoreFoundation) + 462  [0x7fff354c1ffe]
+                     1831 __CFRunLoopRun  (in CoreFoundation) + 2028  [0x7fff354c2e47]
+                       1831 __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__  (in CoreFoundation) + 9  [0x7fff35503041]
+                         1831 _dispatch_main_queue_callback_4CF  (in libdispatch.dylib) + 936  [0x7fff6f3bccab]
+                           1831 _dispatch_client_callout  (in libdispatch.dylib) + 8  [0x7fff6f3b1658]
+                             1831 _dispatch_call_block_and_release  (in libdispatch.dylib) + 12  [0x7fff6f3b06c4]
+                               1831 invocation function for block in CGSDatagramReadStream::dispatch_main_queue_datagrams_async(dispatch_queue_s*, CGSDatagramReadStream*)  (in SkyLight) + 54  [0x7fff64855ad2]
+                                 1831 CGSDatagramReadStream::dispatch_next_main_queue_datagram()  (in SkyLight) + 242  [0x7fff64624fca]
+                                   1831 (anonymous namespace)::notify_datagram_handler(unsigned int, CGSDatagramType, void*, unsigned long, void*)  (in SkyLight) + 98  [0x7fff64620dec]
+                                     1831 CGSPostLocalNotification  (in SkyLight) + 430  [0x7fff646213af]
+                                       1831 displayConfigFinalizedProc  (in SkyLight) + 259  [0x7fff6462b760]
+                                         1831 _cvDisplayLinkDisplayReconfigurationCallback(unsigned int, unsigned int, void*)  (in CoreVideo) + 68  [0x7fff374781c9]
+                                           1831 CVCGDisplayLink::displayReconfiguration(unsigned int, unsigned int)  (in CoreVideo) + 54  [0x7fff37478222]
+                                             1831 CVDisplayLink::stop()  (in CoreVideo) + 25  [0x7fff37453f75]
+                                               1831 _pthread_mutex_firstfit_lock_slow  (in libsystem_pthread.dylib) + 222  [0x7fff6f60a937]
+                                                 1831 _pthread_mutex_firstfit_lock_wait  (in libsystem_pthread.dylib) + 83  [0x7fff6f60c917]
+                                                   1831 __psynch_mutexwait  (in libsystem_kernel.dylib) + 10  [0x7fff6f54e062]

OpenGL thread –

1831 Thread_34440: CVDisplayLink
+ 1831 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff6f60ab8b]
+   1831 _pthread_start  (in libsystem_pthread.dylib) + 148  [0x7fff6f60f109]
+     1831 CVDisplayLink::runIOThread()  (in CoreVideo) + 626  [0x7fff374542c8]
+       1831 CVDisplayLink::performIO(CVTimeStamp*)  (in CoreVideo) + 230  [0x7fff37454e92]
+         1831 juce::OpenGLContext::CachedImage::displayLinkCallback(__CVDisplayLink*, CVTimeStamp const*, CVTimeStamp const*, unsigned long long, unsigned long long*, void*)  (in Loopcloud) + 12  [0x10189e39c]
+           1831 juce::OpenGLContext::CachedImage::renderFrame()  (in Loopcloud) + 121  [0x10189e419]
+             1831 juce::MessageManager::Lock::tryAcquire(bool) const  (in Loopcloud) + 425  [0x101740fe9]
+               1831 juce::WaitableEvent::wait(int) const  (in Loopcloud) + 171  [0x1016fb6db]
+                 1831 std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&)  (in libc++.1.dylib) + 18  [0x7fff6c6de592]
+                   1831 _pthread_cond_wait  (in libsystem_pthread.dylib) + 698  [0x7fff6f60f425]
+                     1831 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  [0x7fff6f54e882]

What is supposed to happen here?

1 Like

Could the OpenGL thread just give up trying for the lock after a second or two and return?

You can recreate this bug by running an app that does a lot of work in the rendering and just switching scaling resolutions in system preferences.

OpenGL rendering thread is here:

@t0m - looks like a problem with the way the message manager lock works to me. The message thread is stuck in an OS lock which is being held by the CVDisplayLink thread. But the CVDisplayLink thread calls tryEnter on the MessageManager lock and that can’t lock, but doesn’t return either. And there we stay :frowning:

2 Likes

Bump - this is a fairly serious bug, any chance someone could at least provide some advice as to whether this is a genuine problem with MessageManager::Lock or not :slight_smile:

Probably similar to this

2 Likes

Sure looks like the same problem. :slight_smile:

I don’t think this is something we can easily fix in the MessageManagerLock - the lock causing the deadlock is in the CVDisplayLink code which is waiting for the CVDisplayLink thread to finish it’s callback before closing the display due to a display configuration change. This thread is doing the OpenGL rendering and takes a MessageManagerLock which is waiting on the main thread.

Short of removing all the MM locks from the OpenGL rendering code, which would be a huge rewrite, the next best fix is probably to revert to doing the actual rendering on a ThreadPoolJob (as it does on other platforms where we don’t have the CVDisplayLinkCallback driving the rendering) and just use the CVDisplayLinkCallback to wake up the job when it needs to render. We’ve added this in 281ae0b which fixes the deadlock, but now there is the issue of the rendering not being directly tied to the display callback as it was before. From some brief tests, the cost of waking up the rendering job which is just waiting indefinitely on the repaintEvent is minimal and shouldn’t cause sync issues, but it would be interesting to see if anyone encounters any issues with this change. I know that @yairadix proposed the change initially so would be interested to hear his thoughts on it.

1 Like

Ed,

Good to see you here!

The OpenGL rendering is waiting in tryEnter here. It doesn’t have the message manager lock yet. Can’t tryEnter just fail and it can return without rendering anything?

J.

I mean, it feels like tryEnter should return no matter what state the message thread is in!

Hmm, I was seeing a slightly different deadlock than the one in your post. Is the issue fixed with the change I posted above for you?

1 Like

I don’t know yet becuse i couldn’t cherry pick merge it, and i’ve got 150,000 lines of source code to fix bloody unique_ptr related changes in as we upgrade to JUCE 6 to try it :wink:

the fix works fine for me, thanks.

1 Like

I’ll let you know tomorrow :slight_smile:

I can no longer easily replicate the lockup.

F*** unique_ptr<> though.

1 Like

I stumbled upon this one time by chance, but trying to reproduce it, i.e by connecting and disconnecting a screen where my window was, also putting the laptop to sleep in the middle, waking, connecting again etc I just can’t seem to reproduce it (this is without @ed95’s fix because I wish to find a more performant fix).

Do you happen to have a good method to reproduce this?

You need to be spending a while redrawing. Make a big window with loads of complex stuff in it. Then it’s easy to repro, just change the scaling in preferences.

1 Like

I was able to reproduce it with the OpenGLDemo by cranking the zoom and speed sliders, enabling “Draw 2D graphics in background” and setting the texture to be dynamically generated.

1 Like

Thanks! Worked like a charm on the first try!

This also works with just moving the window between the monitors without even detaching/attaching a monitor