OpenGL deadlock on Windows

Halp!

I planned to release my plugin tomorrow, but now a wild beta tester appears and is experiencing a reproducable deadlock (which I cannot reproduce myself)!

The deadlock happens after opening two separate instances and then selecting another track, i.e. closing the editors.

These are the relevant stack frames he sent me:



I’ve noticed that the following two functions share the same mutex, so the dealock is probably there:

OpenGlContext::CachedImage::ContextsWaitingForFlush::flush();
OpenGlContext::CachedImage::ContextsWaitingForFlush::handleAsyncUpdate();

I fail to see how I could have introduced the deadlock: All texture related operations are performed strictly in the OpenGL function calls provided by the OpenGLRenderer class.

Anyone with any insights on what could be wrong here would sweeten my evening so much! :grimacing:

Ableton 11.0.12
JUCE development tip
Windows 10
VST3

One thing I have in mind:
Do I need to call OpenGLTexture::release() every time before I load a new texture with OpenGLTexture::loadImage() or are they released automatically?

This sounds like it could be related to some OpenGL threading changes I made recently, to fix another deadlock on macOS. I’ll try to take a look tomorrow.

Hi @reuk and thanks for answering!

I’ve only switched to latest development tip in the last few days, this particular issue was also reported while I was still on 7.0.1, which I think was before your changes!

Have a nice evening!

That’s an interesting data point! The stack traces you show go through the new code that I added, which is why I assumed that the issue was caused by my recent changes.

Do you happen to know what graphics hardware your tester is running?

Nope sadly not…

I have since discovered that it is in fact not a deadlock. Rather, the UI gets extremely sluggish as soon as components are being drawn repeatedly on a timer call. Drawing is fine, but mouse interactions are sluggish.

The draw task can be as simple rendering a static image. The more components get animated, the sluggier it gets. The animations all run flawlessly anyway, but the mouse interaction becomes oh so slow.

Once a second instance is openened, everything comes to a grinding halt with reaction times of around 10 seconds.

As the above stacktraces were collected during such a halt, I suspect it’s still about this.

I’m still not 100% certain why it happens. Unfortunately it’s a highly async process and absolutely non-trivial to grasp with conventional debugging. GL commands do not block until the work is done! Internally , on the driver side, they can (and ideally should) be executed much later.
Now, even if the plugin instances use multiple different OpenGLContexts, the underlying GPU / Driver must somehow queue the command buffers in order. Since the plugin instances run in parallel, the GL driver regulary switches between a “current” context (due to wglMakeCurrent). A context switch implicitly flushes the command pipeline, triggering command execution all over the place. Unfortunately some flushes can trigger intensive rendering, which stalls the CPU until it’s synced with the GPU. This results in other plugin instances becoming unresponsive.

There are multiple things that cause flushing of the pipeline. Reducing this can improve the issue and even solve it completely. I discovered multiple causes, explained in detail this thread:

Although I have to say, I haven’t tested these fixes with the newest changes. But this issue definitely existed before them!

Anyway, the non obvious thing here is that a simple glError query in one plugin instance can trigger stalling in another instance. An implicit CPU<->GPU sync. So reducing the amount of syncing in one plugin will improve the overall performance of all instances.

Ok so I’m not alone in this! Good to hear. I’m triggering the repaints myself, because I’m using some components which repaint a juce::paint(g) function repeatedly on a timer.

I want to thank you for the suggestion, but I cannot hotfix in such a massive change in code I don’t really understand so close to release. Maybe I can try it the following days. If these changes do what they’re supposed to, are there any chances these are making their way into the official JUCE repo?

I’ve mitigated the problem for now by reducing the repaint frequency of said components from 60 to 15Hz for Ableton specifically. This is of course rather undesirable, but better than a “freezing” UI for sure!

I believe that the deadlock is now fixed on develop. I’ve also attempted to improve the rendering performance a bit, especially on Windows:

2 Likes