Deadlock on error handling


#1

Well, quite easy, but still, here’s the callstack of the last two thread in a deadlocked process leaving:
Message thread:

#0  0x00007f4546599474 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x00007f4546594e94 in _L_lock_1020 () from /lib/libpthread.so.0
#2  0x00007f4546594cf7 in pthread_mutex_lock () from /lib/libpthread.so.0
#5  0x00007f454569767a in ?? () from /usr/lib/libGL.so.1
#6  0x00007f454583f275 in _XFreeExtData () from /usr/lib/libX11.so.6
#7  0x00007f454584c891 in _XFreeDisplayStructure () from /usr/lib/libX11.so.6
#8  0x00007f4545837ed1 in XCloseDisplay () from /usr/lib/libX11.so.6
#9  0x00000000006b3d4a in juce::MessageManager::doPlatformSpecificShutdown ()    at ../../src/native/linux/juce_linux_Messaging.cpp:303
#10 0x000000000058b6dd in ~MessageManager (this=0x2a5ac60,    __in_chrg=<value optimized out>)    at ../../src/events/juce_MessageManager.cpp:61
#11 0x00000000005770b1 in juce::shutdownJuce_GUI ()    at ../../src/application/juce_Application.cpp:364
#12 0x0000000000576ce7 in juce::JUCEApplication::shutdownAppAndClearUp ()    at ../../src/application/juce_Application.cpp:259
#13 0x000000000057669a in juce::JUCEApplication::main

Other thread:

// missing lines here are internal to libc
#4  0x00007f454424a465 in exit () from /lib/libc.so.6
#5  0x00000000006b2ace in juce::Process::terminate ()    at ../../src/native/linux/juce_linux_Threads.cpp:186
#6  0x00000000006b367f in ioErrorHandler (display=0x2a5f4b0)    at ../../src/native/linux/juce_linux_Messaging.cpp:160
#7  0x00007f454585de3e in _XIOError () from /usr/lib/libX11.so.6
#8  0x00007f454586525f in ?? () from /usr/lib/libX11.so.6
#9  0x00007f4545865820 in _XReply () from /usr/lib/libX11.so.6
#10 0x00007f4545852dab in XQueryExtension () from /usr/lib/libX11.so.6
#11 0x00007f45458470af in XInitExtension () from /usr/lib/libX11.so.6
#12 0x00007f4544f2c712 in XextAddDisplay () from /usr/lib/libXext.so.6
// Missing lines here are ?? () in libGL.so.1
#20 0x00007f4546591dd9 in __nptl_deallocate_tsd () from /lib/libpthread.so.0
#21 0x00007f4546592748 in start_thread () from /lib/libpthread.so.0

Seems like XIOError holds X’s lock in a thread which is not the message manager thread (and the message thread need that lock to close X stuff)
It might be a good idea to change the Process::terminate (or ioErrorHandler) under linux to post quit message instead of calling plain exit().


#2

Perhaps it just needs to unlock the X server like this:

    static int ioErrorHandler (Display* display)
    {
        DBG ("ERROR: connection to X server broken.. terminating.");

        errorOccurred = true;
        
        if (JUCEApplication::getInstance() != 0)
        {
            XUnlockDisplay (display);
            Process::terminate();
        }

?


#3

Should work for my case (I’m unable to reproduce, obviously it’s hard to get this deadlock, X connection must be broken while the other thread is doing OpenGL related work).
However, it seems from the documentation that it shouldn’t do this (from http://tronche.com/gui/x/xlib/event-handling/protocol-errors/XSetErrorHandler.html )
"However, the error handler should not call any functions (directly or indirectly) on the display that will generate protocol requests or that will look for input events. The previous error handler is returned"
I’m not sure if XUnlockDisplay generate a protocol request, but I think so, since it allow another thread to continue.


#4

ok, well I guess that it’d also work ok by just calling MessageManager::stopDispatchLoop() and letting it terminate normally.


#5

Like this ?

// Usually happens when client-server connection is broken
static int ioErrorHandler (Display* display)
{
    DBG ("ERROR: connection to X server broken.. terminating.");


    if (MessageManager::getInstance() != 0)
        MessageManager::getInstance()->stopDispatchLoop();

    // Changed after the line above else posting message will fail
    errorOccurred = true;

    return 0;
}

#6

Yes, that looks right, I think!