Direct2D Performance Update

Hi everyone-

The recent changes to the JUCE Direct2D renderer are a big improvement (see here); however, there are still some bottlenecks that are slowing performance.

I’m working with the JUCE team to get that sorted out; I’ve posted a branch to my unofficial fork:

We’re also putting together revised documentation explaining how to get the most of out of the renderer.

Most of these changes have to do with moving functions into the GPU where possible. This will mainly benefit procedurally generated images; I’ve heard from several developers who want to paint to an image and then immediately paint that image to a window.

Changes:

  • Better performance for component effects (disable software backup for component effect output)

  • Implemented several Image methods to run in the GPU for GPU-only images (convertedToFormat, moveImageSection, clone)

Matt

4 Likes

Say you want to paint to an Image in your paint handler and then paint that Image to your window. You’re painting that Image every time, so there’s no need to preserve the data on the CPU; that Image can live entirely in the GPU.

In that case, try marking your image as “no backup”. “No backup” means that the Image data will not be backed up to the CPU and the image data will only be stored in the GPU. To do so, call Image::setBackupEnabled:

    void paint(juce::Graphics& g) override
    {
        double elapsedSeconds = 0.0;

        if (outputImage.getBounds().isEmpty())
        {
            outputImage = juce::Image{ juce::Image::ARGB, getWidth(), getHeight(), true, *imageType };
            outputImage.setBackupEnabled(false); // Make this a GPU-only image
        }

        {
            juce::Graphics ig{ outputImage };
            ig.setColour(juce::Colours::red);
            ig.fillEllipse(outputImage.getBounds().toFloat());
        }

        g.drawImageAt(outputImage, 0, 0);
Backing up the data can take several milliseconds depending on the size of the image, which can be a significant performance hit.

Matt
2 Likes

Here’s a short video demonstrating the performance improvements with the new “no backup” mode:

Many of the reported performance issues had to do with drop shadows; drop shadows should be much improved with the fork I posted earlier.

Matt

6 Likes

How does creating the image first and afterwards disabling the backup affect performance? Doesn’t creating the image already cause a copy to be made into the regular RAM? Does the setBackupEnabled(false) actually also free the backup-memory that already exists?

Wouldn’t it be better to be able to pass an image-type that is native and no-backup, so that the initial copy back to RAM is never made in the first place?

1 Like

This is super exciting! Thanks for putting together that video.

Yes! Thank you!

Cool stuff!
How do one knows the image has been flushed out of the GPU and needs to be generated again?

Good questions!

Creating the Image allocates a software Image in CPU RAM. The Direct2D bitmap in the GPU is not immediately allocated until the image is painted or modified.

No. The copy from GPU VRAM to CPU RAM happens after you paint or modify the image.

No.

Calling setBackupEnabled(false) prevents any future automatic GPU → CPU copying.

Quite possibly!

I want to tread a little carefully here; this update is significant because it allows several long-term problems to be resolved. The team put a lot of hard work into it and the update is making a big difference. I can finally release the module I’ve been working on.

However, I’m not entirely convinced about the API; I’m concerned there may still be areas that cause confusion. The approach you’re suggesting may well be better.

Matt

1 Like

It’s a little awkward; here’s a helper function:

        bool isImagePainted(juce::Image const& image)
        {
            if (auto pixelData = image.getPixelData())
                if (auto backupExtensions = pixelData->getBackupExtensions())
                    return backupExtensions->needsBackup() && ! backupExtensions->canBackup();

            return false;
        }

Matt

2 Likes

That is quite a mouthful :smiley:
Good to know it is possible at least!
I hope this will be integrated in the API as this would expand the usability of this approach for longer-lived images that can still be generated if needed.

Even better would be to be able to specify a lambda that generates the image and gets called transparently when needed.

Or if image->isValid() checked that too.

A few more updates:

Automatic ImageType selection

Added various checks to make sure that the correct ImageType is used that matches the active renderer. Turns out its easy to end up in a situation (especially with the software renderer) where you end up bouncing the same image data back and forth between the GPU & CPU. Specifically:

effectImage used to paint Components with an effect (setComponentEffect(…))
DropShadow::drawForPath
Cached component image
LookAndFeel progress bar
DragAndDropContainer thumbnail
Component::createComponentSnapshot
ListBox::createSnapshotOfRows

This change may resolve this issue: createComponentSnapshot looks different than the component painting itself - #5 by RolandMR

Remove swap chain thread

The renderer uses a swap chain to sequentially stage each frame and paint the window. JUCE uses a waitable swap chain, meaning that DXGI signals an event when the swap chain is ready for the next frame. Previously, the renderer would create a thread for each window to service that event and schedule the next frame.

In retrospect, that thread may not be necessary since there is already a thread per monitor using DXGI to wait for vblank for reach monitor. The thread therefore seems to be redundant and may actually be making matters worse by clogging up the message thread with excessive notifications.

So this is somewhat experimental, but removing the swap chain thread entirely seems to work just fine with multiple plugin instances running in the same process.

Please let me know if you have any other issues or performance problems.

Matt

4 Likes

hi all

i’ve patched in matt’s proposed changes and they appear to have 100% resolved the host freeze issues i was seeing when multiple instances of plugin editors were onscreen at once

here’s an unlisted video showing the dramatic improvement. the improvement is also present in my more complex “real” plugins.

thanks, matt!

7 Likes

Thank you @remaincalm for working through all this!

1 Like

:+1: 1 thread per monitor is absolutely the way to go! I hope this finds its way into official JUCE very soon, thanks @matt.

Update: Thinking about this more, the issue probably still happens if there are multiple JUCE8-based plugins drawing that cannot share the per-monitor thread. The entire vblank → repaint() → paint mechanism does flood the message queue per loaded juce binary using vblank. IMHO even with the latest changes, there are too many layers of triggerAsyncUpdate() involved that all have no time guarantees and can overwhelm the message queue.

hmm. yeah, so i set up a horrible test setup.
with 16 x 1 d2d plugin at 60fps, ran fine.
with another 16 of a different d2d plugin, also ran fine.
with a bunch of a third d2d plugin, started running into host issues again, although it took a bit to trigger.

the experiment is pretty edge-case (there’s a lot on screen at once) and i think the patches are still very worthwhile but yeah there might need to be a rethink of something.

OK, can you shed any light about what might be different about the third plugin?

Matt

will ping you some ideas

i still think your patches i am running are dramatically better than where we were though

Hi everyone-

Just pushed an update that should help with multiple simultaneous plugin windows running in a single host.

Matt

1 Like

I have uploaded my own test project here:
ZioGuido/JuceAnimationTest: Simple project to test Juce8 D2D capabilities

There are two precompiled binaries, one using the last develop of JUCE 7.12, the other using today’s version of Matt’s fork of JUCE 8.06.

For what I can see, the software renderer of JUCE7 still performs way better than JUCE8 D2D renderer.