Performance issue with OpenGL in multiple windows


#1

This is currently on Mac, in a standalone application, but I think the problem probably applies to all platforms. (It is also on JUCE 4.3 – sorry; if this has been resolved in more recent versions, that would be a solution as well).

I have found that using an OpenGLContext attached to the top level component of the window improves CPU usage substantially in my app. I am not using any OpenGL calls directly, I am just using it to accelerate JUCE’s 2d Drawing.

If I create multiple windows, and each has an OpenGLContext attached to its top level component, I find that the CPU load goes way up (e.g. from 7-9% for one window open, all the way up to 60-70% with two windows open).

I profiled the app, and found that a huge amount of time was being spent in MessageManagerLock::attemptLock,
specifically in the Thread::yield() call.

I suspect that this is because each OpenGLContext creates its own ThreadPool and they are essentially spin-waiting on each other to lock the MessageManager.

If I change the OpenGLContext::renderFrame() to unconditionally lock the MessageManager, then the CPU load drops to something reasonable, but then (not unexpectedly) I get a deadlock when the system tries to tear down the ThreadPool (e.g. when closing a window).

Is there some sort of method to coordinate the rendering of the OpenGLContexts to avoid this lock contention and spin-waiting?

If not, are there any suggestions for how to essentially single-thread the renderFrame() calls for multiple OpenGLContexts?

Effectively, without some sort of support for this, multiple OpenGLContexts are not performant.

TIA!


OpenGLRenderer Massive Performance Hit With Multiple Instances
#2

So, the CPU load is due to the busy-waiting while trying to get the MM lock.

Adding a 1ms sleep before each yield in the MessageManagerLock::attemptLock implementation pretty much gets rid of the busy wait, (e.g. if one thread is drawing for 3ms (10% of 30 FPS), then the lock and yield iis tried 3-4 times rather than hundreds or thousands of times, and the CPU load goes back to the expected level.

This is sort of hacky, but I can’t discern any performance problems and it reduces the overally CPU load by a factor of 5 or 6.

Does this seem like a useful addition to the code, or is there a better solution?


#3

Hi @mhbj,

Thanks for bringing this up. Unfortunately, it’s actually quite difficult to solve for all use cases and I’ll need to think about this one a bit longer.

However, if you are not using the OpenGLContext::executeOnGLThread method, then you can change line juce_OpenGLContext.cpp:229 to the following:

mmLock = new MessageManagerLock /*(worker)*/;  // need to acquire this before locking the context.

Let me know if this solves your issue for now.


#4

Hi Fabian,

Thanks for the reply.

So, I am not using that API (in particular, it doesn’t exist in JUCE 4.3). But I don’t think your proposed work-around helps me.

If you look at lines 440:457 in juce_OpenGLContext.cpp, you’ll see that the code enters a while loop to do the rendering; the code on line 432 has already executed and the MessageManagerLock has already been destroyed by the time we hit the performance issue I am experiencing.

The issue occurs on line 229. This is where the MMLock is acquired in each frame render. This acquisition of the lock is the one the busy waits when there is more than one OpenGLContext active, and all are rendering on the same cadence.

If I make the change you suggested for line 432 on line 229, the performance problem is indeed solved, but then you get a deadlock when the OpenGLContext is torn down.

What I have found works reasonably well is to add a 1ms delay on line 297 of juce_MessageManager.cpp. It is probably fine to remove the Thread::yield() call as well.

Adding the delay causes the thread to sleep rather than just spin waiting; I think that is what the Thread::yield() call is trying to do, but it is not effective.

A better solution would be for you guys to implement a version of the enter() call on the critical section that supports a timeout. Then, instead of doing the mm->lockingLock.tryEnter() on line 293, you could do something like mm->lockingLock.enter(1) /*acquire the critical section and wait up to 1 ms to get the lock */; This would avoid the spin wait and still allow for the bail-out check to occur, without the fixed delay of my work-around (since the critical section being acquired would wake the thread). In this case, you could actually increase the timeout to something like 30ms, as you wouldn’t have to wait for the full time if the critical section becomes available, and that would use the least amount of CPU time.

As far as I can tell, the current implementation of the critical sections in JUCE doesn’t support a lock with a timeout, so that would have to be added.

Best regards,

B.J.


#5

Hi @mhbj

Sorry, I meant to write line 229 (I’ve edited my reply above). As you mentioned, changing that line seems to solve the performance problem - at least for me for this test app that I wrote.

Unfortunately, JUCE 5.1 is a bit different with regards to that worker: users can add work to the gl thread while locking the message thread and blocking it - waiting for the job to finish.

My current idea is to remove any type of spinning altogether. I’ve implemented this on my private github account here. It works for my test app: performance is good, the locking code is no bottle neck anymore and there are no lock-ups on shutdown (or when processing jobs via OpenGLContext::executeOnGLThread), but it obviously still needs a lot of clean-up and a thorough code-review/testing as this is a really sensitive part of the JUCE codebase.

Let me know if something like this would work for you (it’s probably best for you to do a diff with the develop branch to better see the changes).


JUCE window stalls for 10 sec when Internet is on
#6

Awesome! It’s nice to hear the multiple context opengl problem is being worked on. This is what prevents me from using OpenGL in plugins where multiple contexts have to be expected because multiple instances might be open at the same time.


#7

Fabian,

Is the message manager lock required to protect the MM Thread from the OpenGL rendering thread, or is it also required to protect each OpenGL thread from each other?

RIght now (either with the workaround that I am using, or with your rework of the locking) the locking scheme has the effect of single-threading all the rendering threads (and blocking the MM thread while the rendering threads are actually rendering).

Ideally, the rendering would be able to do its thing while the MM thread continues to do its thing, but I understand why these threads need to be protected from each other.

I am less clear on why the individual OpenGL context rendering threads need to be protected from each other. Since they each have their own context, shouldn’t it be possible for the rendering threads to all be running in parallel while some coordinator holds the MMLock? If you have multiple OGL contexts in your app/plugin, right now they all contend for the MMLock and you wind up running the rendering single threaded, with the overhead of acquiring the lock. So if you are trying to get 30FPS, that 33ms needs to be shared by all the contexts.

If the rendering threads could run in parallel, then the lock would only be held for the max rendering time rather than the sum of the rendering times (or some division of the sum if there were more contexts than cores), which would keep the app responsive and the rendering smooth.

If it is isn’t possible to have the rendering threads running in parallel, then it seems like there should be some OGL coordinator that explicitly serializes the the rendering for all the contexts that require the MMLock rather than contending on the lock.


#8

This is only if you attach the OpenGLContext to a Component. The OpenGLContext will not try to get a message manager lock at all if you are simply interested in the OpenGLRenderer callbacks. You obviously, need the message manager lock if the OpenGLContext should also render JUCE components (i.e. if the OpenGLContext is attached to a component). You can’t have JUCE components resizing, deleting themselves etc. while they are being rendered.

Also, sorry for not having a fix for the performance problems you are experiencing. I’m still working on it and it’s nearly ready. I just want Jules to have another look once he is back from his holidays.


#9

Any update on this?

We’re getting reports from some Windows users that the frame rate drops massively when two plugin windows are open on their specific hardware. We have not been able to replicate this on any of our dev machines though: they all perform fine with 10+ instances at 60 Hz OpenGL rendering.


#10

We’re seeing here many bizzars with graphics and sadly I we have yet to find a “one fits all” solution.

Some machines would get better with OpenGL, others would feel much fluent with non-OpenGL.
Currently we have settings or flags for turning it on or off.
Windows is safest with off for broader compatibility.

I have yet to nail it but I think many of those behaviors related to juce message thread choking.
My next plan is to try and move some graphics code to background thread as suggested in the forum.
On my current project without OpenGL even a single meter makes UI sluggish. (and I don’t do any crazy redraws only using AudioProcessorValueTree attachments and sliders…)


#11

Yes, we’ve been working on fixing this. However, it needed some deep changes in the MessageManager locking implementation so we are still testing this.

Can you try the following branch and let me know if this improves performance for you? This would be super valuable information for us.


Sluggish graphics performance on MAC OSX
#12

A fix for this is now on develop with commit b9b3439. Let me know how this fix works for you.

@IvanC


#13

Hey Fabian, I am using the latest official download and still have this issue with a really lightweight plugin. I have 60fps, then when I drop another instance to the DAW, 30 fps, another instance 15fps… then crash :smiley: No problem if I keep open only one GUI of these plugin instances… then it’s simply 60fps for that instance.

Should your fix included in the latest release?

thanks, Kevin


#14

Is that on macOS? Apple recommends using CVDisplayLink, which JUCE doesn’t, but we’ve found that it helps immensely with performance of multiple windows with OpenGL on macOS, and it’s implemented in the SR branch.


#15

It’s Win10 with an NVidia GTX card. But it’s definitely not openGL performance issue as even if I don’t draw just few lines it happens… I guess it’s something with blocking threads.


#16

We’ve seen similar issues as well. Do you have continuous repainting turned on?


#17

We have something similar happening on Linux, deadlocking as soon as you open a second window of the plugin


#18

Yes it’s on, and V-Sync is also on (setSwapInterval (1)). But even if V-Sync is off (0), then I get 4-500fps drop to 200fps with 2 instances, then something below 50fps with 3 instances , etc…


#19

I think it would be helpful for the JUCE team to get some example source-code that can reproduce this.
Meanwhile there’s a lot of important feedback without a reproducible way that can be confirmed even on each of the participators of this thread.

I’d be more than happy to try and reproduce it here on Mac/Win & Linux VM


#20

@fabian : I just made a minimal plugin to demonstrate that every new plugin instance is decreasing the FPS and after a while it leads to crash. With one instance it works perfectly tho. Here is a video demonstrating the issue: VIDEO

On the video you can see the FPS that I calculate in the RenderOpenGl() function is fixed to about 60FPS, so it is called nicely, but the real render FPS measured by the third party app FRAPS is dropping with every new plugin instance. Another interesting thing is that the more instances I drop into the DAW, the FPS I calculate in RenderOpenGl() is increasing by about 1 FPS/instance… so after a few plugins the original 60FPS is increased to 66-67FPS !!! It seems the function is called more frequently than the otherwise VSYNC-ed 60FPS.

On the video, it didn’t crash, but actually, if I would drop more plugins into the project it would crash for sure. Here is a crash debug picture if it makes any sense:

Please PM me, if you would like to see the source code too, but it’s pretty basic, I just made it from scratch in 20 mins.

Thanks in advance if you could look into the issue!
Attila