[FIXED] VBlankAttachment problem on Windows if callback takes too long

There is a small issue with VBlankAttachment on Windows when you do heavy rendering inside the callback (which would be a pretty typical thing to use a VBlankAttachment for). If the onVBlank() callback takes too much time, the next VBlank will arrive while rendering is still in progress. This happens because the internal AsyncUpdater will allow you to trigger a new callback as soon as the old callback begins, instead of when it ends.

This is what will happen then:

  • First vblank arrives, and triggers a repaint that will take say 30ms to complete
  • While the repaint is happening, the VSyncThread will enqueue the next vblank.
  • When the first vblank is done, it will immediately start processing the next vblank that we missed
  • After that callback, another vblank will have been enqueued, which it will then process, and this can go on forever.
  • Control will never be returned to the MessageManager, since it will constantly be dequeueing new vblanks, and the app becomes unresponsive.

The good news is that this issue is simple to fix. Inside of the VSyncThread class, instead of this:

    void run() override
    {
        while (! threadShouldExit())
        {
            if (output->WaitForVBlank() == S_OK)
                triggerAsyncUpdate();
            else
                Thread::sleep (1);
        }
    }

    void handleAsyncUpdate() override
    {
        for (auto& listener : listeners)
            listener.get().onVBlank();
    }

We need to do this:

   std::atomic<bool> isInsideCallback = false;

 ... 

    void run() override
    {
        while (! threadShouldExit())
        {
            if (!isInsideCallback && output->WaitForVBlank() == S_OK)
                triggerAsyncUpdate();
            else
                Thread::sleep (1);
        }
    }

    void handleAsyncUpdate() override
    {
        isInsideCallback = true;
        for (auto& listener : listeners)
            listener.get().onVBlank();
        isInsideCallback = false;
    }

This will make sure that missed VBlanks will not be enqueued, and resolves the freezing issue. This issue does not seem to occur on macOS or Linux, since they don’t use a separate thread for VBlanks.

I’ve tested this on Windows 11 with JUCE 7.0.11, on multiple systems.

7 Likes

Might I suggest a little ScopedValueSetter on that boolean? I believe that would be somewhat Jucey.

1 Like

I’m not sure I agree with this, if I understand correctly. This would bail out the callback entirely because one of the nested functions is taking too long? What if we are counting the blanks for some logic that estimates monitor refresh rate, and there happens to be a poorly performing paint routine? Would this not drag the estimated frame rate down, and potentially mess with some other things in play?

I think this kind of check should be done downstream. Globally lowering the callback rate to the lowest performer seems wrong.

The first problem is that this doesn’t happen on macOS and Linux, so this is a difference in behaviour across platforms. So if we do want VBlankAttachment to continue even if rendering is in progress, it should be the same on every OS at least.

Since it uses an AsyncUpdater internally, you already can’t rely on VBlankAttachment to count every single blank that happens. If the message thread is busy, it would trigger mutiple async updates, but only one callback will actually happen. The only difference is that we’re extending this logic to the work that is done in the callback itself.

With the current implementation on Windows, the framerate can’t drop far at all without making the entire app unresponsive, because it will never stop repainting if our framerate is constantly lower than the vblank rate. Also, what we’re doing here is preventing a vblank from happening when we’ve already missed the deadline for it. I believe it’s normal in this case to wait for the next vblank, instead of trying to make up for the lost frame.

It’s very easy to accidentally create an infinite rendering loop like this. And what’s worse is that whether that will happen or not depends on how fast your CPU is, so if a user has a slower PC, instead of framerate simply dropping, the app freezes.

It is better to allow the renderer some wiggle room to “catch up” on the frame count. At the very least this would be a significant change in behavior which risks breaking lots of existing plugins.

That problem could be mitigated using an atomic int instead of bool. If the goal is to prevent a complete deadlock, you should be able to achieve that without fundamentally changing behavior of all plugins.

I guess, but only if the plugin is already working around this issue by implementing custom behaviour for Windows, or if it’s a Windows-only product. If the plugin currently works on macOS or Linux, this would make the behaviour similar to that, so it should continue to work. But we definitely need to verify that more thoroughly than I have done so far.

I’m not sure how that would work, but if it’s a better way to fix the freezing problem, I’m all ears :slight_smile:

A simple way to reproduce this, is taking the AnimationAppDemo example, and adding a 20ms sleep inside the update() function to simulate heavy processing load. The update() function is attached to a VBlankAttachment if you set “setSynchroniseToVBlank” to true, which the example already does.

The result is, on macOS the framerate drops down to 30fps. This is the expected behaviour. On Windows, it will freeze completely: https://youtu.be/5Tt13WtZgYw

Another thing to keep in mind, is that people have different refresh rates. I could test my app to work with a 60hz monitor. But if the user has a 120hz or 144hz monitor, we suddenly have a lot less time to render if we want to prevent the app from locking up. So if my render loop takes ~12ms on my setup, you could think it works fine. But if my user has a setup that is 30% slower or has a higher refresh rate, the app will be completely unusable. It’s a very easy way to shoot yourself in the foot.

Looks like this is fixed in JUCE 8, by the looks of this commit. Thanks @reuk!!

3 Likes