Race conditions, thread safety and crashes in OpenGLContext


#1

Finding some more problems with OpenGL and JUCE.

  1. Main thread is calling OpenGLContext::detach(some_component).
  2. This calls context.attachment = nullptr;
  3. This sets the evaluated value of context.attachment to nullptr
  4. Now, the old attachment will be deleted (but it still exists).
  5. This deletion signals the opengl thread to exit.
  6. This deletion waits on the opengl thread to exit.

Meanwhile, on the OpenGL thread:

  1. While it is creating a OpenGLGraphicsContext, that creates a ShaderContext, that creates a GLState, it calls CachedImageList::get(context)
  2. This calls OpenGLContext::getAssociatedObject, that acquires the CachedImage. This is acquired from OpenGLContext::getTargetComponet, that acquires context.attachment.
  3. This is, as we see a nullptr. That would trigger the assert inside getAssociatedObject, however this is a release build. So this generates an access violation.

Since the OpenGLContext shares state between threads unprotected, the fix would be to synchronize access before altering the context (in general). The fix for this case would therefore be to have the open gl thread exit before altering any fields of the context (the .attachment, in particular) - so delete the attachment before setting it to nullptr (it is reversed, right now).
Or, at the very least, make it valid for getAssociatedObject() to return nullptr in ambiguous states.


Also it would seem functions like these are not threadsafe:

Component* OpenGLContext::getTargetComponent() const noexcept
{
	return attachment != nullptr ? attachment->getComponent() : nullptr;
}

The main thread is able to alter the attachment variable while the open gl thread is reading them, leading to race conditions. Even if the race condition is fixed (using atomics), there is still the possible sequence of:

  1. OpenGLContext reads attachment and concludes it isn’t null.
  2. Main thread sets attachment to null.
  3. OpenGLContext reads and dereferences attachment as it believes it isn’t null.

In practice, this won’t happen in the current code generation as the compiler (and maybe CPU) will cache the result of the read. However, when the race condition is solved so the code is semantically sequential, this condition is likely, leading to a segmentation fault.

At a glance, there’s some 5-10 get() functions following this pattern, however I haven’t analyzed them. Notice that it isn’t user-code calling these functions, but internal JUCE functions.


In general, it would seem the pattern with setting ScopedPointers = nullptr and concurrent access is unsafe. Not sure if this is applicable to any other places in the code base.

Regards


#2

Not to annoy you guys, but I would consider this a pretty serious bug (all juce programs using OpenGLContexts with renderComponents = false are susceptible to this).

I wonder if you noticed, and scheduled the fix internally or missed this? In either case, it would be nice to get an acknowledging (even if temporary) answer. If you missed it, then I would suggest, as others have, to work on a bug tracking system :slight_smile:


#3

Hello, sorry, was just looking through posts we may have missed and spotted this one… Will take a look asap!