MessageManager lock too long time in OpenGL CachedImage

Recently we received some reports from our user that the GUI is stucked (including key/mouse operation and regular JUCE component rendering), but OpenGL rendering is not affected. This issue only happen on their computer and we cannot reproduce it on any of our computers.

After some inspect, we found it is probably caused by the acquirement on MessageManagerLock in OpenGL threads too long time, so that it is blocking message thread running fluently. We browsed the code of the OpenGL rendering part, it seems all crucial stuffs are located in CachedImage::renderFrame():

// original code
bool renderFrame()
{
    // lock is possibly acquired at very beginning
    MessageManager::Lock::ScopedTryLockType mmLock (messageManagerLock, false);

    // ...... some actions we do not understand ......

    // all OpenGL rendering calls occur inside this function
    context.renderer->renderOpenGL();

    // ...... some actions that render the related JUCE component......

    // swap buffer and context turning off
    context.swapBuffers();
    OpenGLContext::deactivateCurrentContext();
    return true;
}

There have possibility that the lock to message thread is acquired at the very beginning of the function, and it is retained during the whole function.

Previously, we have also noticed that in some GPU driver implementations, the SwapBuffers() and wglMakeCurrent (nullptr, nullptr) calls took extremely high CPU consumption because they are busy-waited. And it seems these two operations don’t need message thread at all. So we tried to exclude these two functions from the range of message manager lock:

// mod1 code
bool renderFrame()
{
    // lock is possibly acquired at very beginning
    std::unique_ptr<MessageManager::Lock::ScopedTryLockType> mmLock (new MessageManager::Lock::ScopedTryLockType(messageManagerLock, false));

    // ...... some actions we do not understand ......

    // all OpenGL rendering calls occur inside this function
    context.renderer->renderOpenGL();

    // ...... some actions that render the related JUCE component......

    // swap buffer and context turning off AFTER lock range
    mmLock.release(); // typo, should be mmLock.reset()
    context.swapBuffers();
    OpenGLContext::deactivateCurrentContext();
    return true;
}

After apply this, our user reported obvious improvement: previously running two of our plugin instance would stuck their computer, now two instance would run properly, but three blocks.

As OpenGL rendering calls would took obvious CPU time, we attempt to exclude context.renderer->renderOpenGL() from message manager lock range, hoping it would bring further improvements. However we don’t fully understand the renderFrame() function, so we want to ask that: is it safe to lock the renderFrame() function in two separate ranges like this?

// mod2 code
bool renderFrame()
{
    // there is a lock to native OpenGL context in code below, we cannot create scoped lock object to message thread
    // so keep using unique_ptr here
    std::unique_ptr<MessageManager::Lock::ScopedTryLockType> mmLock1(new MessageManager::Lock::ScopedTryLockType(messageManagerLock, false));

    //
    // ...... some actions we do not understand ......
    //

    // all OpenGL rendering calls occur inside this function
    mmLock1.release(); // typo, should be mmLock1.reset()
    context.renderer->renderOpenGL();

    {
        // don't know if need to use trylock or regular lock here
        MessageManager::Lock::ScopedLockType mmLock2(messageManagerLock);
        //
        // ...... some actions that render the related JUCE component......
        //
    }

    // swap buffer and context turning off
    context.swapBuffers();
    OpenGLContext::deactivateCurrentContext();
    return true;
}

Nevertheless, we strongly recommend to exclude context.swapBuffers() and OpenGLContext::deactivateCurrentContext() in master brach of JUCE code as in mod1 code, as this change is simple, would bring improvements instantly, and looks harmless.

:scream:

I think you have to recap some unique ptr basics :wink: The code above does not release the lock and creates a giant memory leak. The intention of a std::unique_ptr is to automatically manage lifetime, however release is the call that tells the pointer to no longer manage the lifetime of the object. release does NOT delete the held instance, after calling release you have to delete the instance manually. What you were probably looking for is reset which deletes the held instance before the pointer goes out of scope. So instead of releasing the lock quickly, this change makes the render thread keeping the lock forever – this is basically the opposite to what you want, I wonder how your plugin didn’t get completely irresponsive here.

Generally scoped locks should always be used as stack variables. Narrowing down the locking scope is usually done by opening up a new scope in code or wrapping the code to be executed under the lock in a lambda in case you need to retrieve a return value. Like this:

bool renderFrame()
{
    { // Open a new scope
        MessageManager::Lock::ScopedTryLockType mmLock (messageManagerLock, false);

        // ...... some actions we do not understand ......

        // all OpenGL rendering calls occur inside this function
        context.renderer->renderOpenGL();
    } // lock is released here

    // ...... some actions that render the related JUCE component......

    // swap buffer and context turning off
    context.swapBuffers();
    OpenGLContext::deactivateCurrentContext();
    return true;
}

Please note that my remarks only refer to coding best practices. I cannot actually say if narrowing down the scope of the lock has dangerous side effects in this specific case. However, one thing I can say for sure is that the user usually expects the renderOpenGL call to be executed under a message manager lock, so excluding that from the lock might only be a solution very specific to your use case and not a general solution.

Oh fxxk I made a critical typo. I will fix it and try the behavior later.

But I still need to create the lock scope on heap, because there’s another scoped lock to native context in some actions we do not understand. If I wrap the message lock scope in brace scope, the native context lock scope would be released too early.