Race condition (i.e. CRASH) in MessageManager::Lock (with fix!)

Scenario

We run all our plugins through pluginval. Occasionally we pluginval fails when fuzzing the parameters of the VST3 versions of our plugins. The output is:

Starting test: pluginval / Fuzz parameters...
Exception: EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)

The call-stack is:

[...]
juce::WaitableEvent::signal() const juce_WaitableEvent.cpp:62
[Inlined] juce::MessageManager::Lock::abort() const juce_MessageManager.cpp:404
[Inlined] juce::MessageManager::Lock::messageCallback() const juce_MessageManager.cpp:398
juce::MessageManager::Lock::BlockingMessage::messageCallback() juce_MessageManager.cpp:290
juce::MessageQueue::deliverNextMessage() juce_osx_MessageQueue.h:81
juce::MessageQueue::runLoopCallback() juce_osx_MessageQueue.h:92
[...]

Note: this crash happens in the pluginval binary, but in our plugins.

Reproduction:

pluginval --strictness-level 10 --validate-in-process --repeat 20 --skip-gui-tests --validate ~/Library/Audio/Plug-Ins/VST3/MyAwesomeTotallyCoolAndSuperblyUnique.vst3

You might have to run this several times in order to trigger the crash.

Analysis

To me this looks like BlockingMessage makes the call to Lock::messageCallback() through a dangling pointer. The call site looks like this:

ScopedLock lock (ownerCriticalSection);

if (auto* o = owner.get())
    o->messageCallback();

The locking of ownerCriticalSection is only happening when the lock was NOT gained. In case the lock was gained, there’s a possibility that the Lock is destroyed without the owner of the BlockingMessage being set to zero.

Fix

The following fixes the race condition:

void MessageManager::Lock::exit() const noexcept
{
    if (lockGained.compareAndSetBool (false, true))
    {
        auto* mm = MessageManager::instance;

        jassert (mm == nullptr || mm->currentThreadHasLockedMessageManager());
        lockGained.set (0);

        if (mm != nullptr)
            mm->threadWithLock = {};
    }

    // this needs to run regardless of wether the lock was gained or not, also the ownerCriticalSection
    // must be locked to ensure that the call to messageCallback doesn't happen through a dangling pointer.
    if (blockingMessage != nullptr)
    {
        ScopedLock lock (blockingMessage->ownerCriticalSection);
        blockingMessage->releaseEvent.signal();
        blockingMessage = nullptr;
    }
}

Closing Remarks

To be perfectly honest, I have a hard time reasoning about the exact sequence of events in this somewhat spaghettiesque corner of the JUCE code base. So my analysis and intuition might be off. All I know is that before it crashed, after it doesn’t.

PS.: pluginval and the plugins are build against JUCE 7.0.5

I’ve tried to repro this with the AudioPluginDemo and DSPModulePluginDemo VST3s on an M1 mac. I’m building the plugins and pluginval from JUCE’s develop branch.

At the moment, I’m not seeing any crashes. Have you tested any of the JUCE examples, and if so, do they fail in the same way for you? I wonder whether the issue only affects specific plugins.

If the issue is only present in a specific plugin, is that plugin manually taking the message manager lock anywhere? Is it using OpenGL?

The plugin in question has a lot of parameters. Perhaps 1000 or something like that. I think this might make the occurrence more likely.

Maybe related BlockingMessage accessing MessageManager::Lock that went out of scope

Thanks for reporting. I’ve pushed a fix here:

Great. Your changes look much more involved that what I did propose. Are you sure your changes don’t have any other unwanted side effects?

Interestingly, I’ve just seen the problem while debugging my app under Xcode (macOS target).
" BUG IN CLIENT OF LIBPLATFORM : Unlock of an os_unfair_lock not owned by current thread"