Direct2D Causes Considerable Performance Drop

Over the past week, I’ve been dealing with the fallout of bumping our JUCE version from 7.0.5 to 8.0.3. I’ve since had two forum posts (here and here) that I started while I was attempting to track down the source of the issues my team was having, but treating problems addressed in those threads has not wholly resolved our issue of multiple plugin instances freezing the DAW.

For background, here’s the biggest cases I’ve found where calling into Direct2D is prohibitively slow:

  • As mentioned in my previous forum post, ColourGradient instantiation is incredibly expensive and there is no user-side way to optimize it aside from just using fewer gradients
  • Contexts created using Graphics(const Image&) always perform a GPU-to-CPU image read which is prohibitively expensive
    • I’m aware this is done to keep the image intact in the event of GPU disconnects, but it should be configurable as the overhead is not worth the caching in our case
  • applyPendingClipList calls after path fill/stroke methods and Component::paintInParentContext
  • Context flush on certain path operations (see screenshot): This may not be a context flush, but instead just a method that’s named FlushInternal

For the reasons listed above, I’d like to request a flag to disable Direct2D entirely at compile time. I understand that most of these can be addressed by switching the render engine via ComponentPeer, but the fact that juce::Image will create graphics contexts using the native renderer regardless of what renderer the peer is using results in an irrecoverable breaking change on our end. In most cases this can be solved through optimization, but it would be nice to get some of the benefits of JUCE 8 without needing to perform an optimization pass on a large codebase just to accommodate D2D.

Our team’s code is proprietary, but I’d gladly share my profiling results with the JUCE team privately if it would aid in improving D2D performance.

For reference, the machine I’ve been profiling on has the following specificaitons:
CPU: AMD Radeon 5800X
GPU: NVIDIA 3090 Ti
RAM: 32GB DDR4
MoBo: MSI Mag X570S Tomahawk Max Wifi

2 Likes

Hello-

I can’t really speak to adding a preprocessor flag to disable Direct2D; that’s up to the JUCE team.

If you’d like to disable Direct2D, then you need to:

Edit Component::createNewPeer in juce_Windowing_windows.cpp; this will change the default window renderer back to the software renderer. This is necessary because the Direct2D renderer is expecting the NativeImageType to be a Direct2D image.

ComponentPeer* Component::createNewPeer (int styleFlags, void* parentHWND)
{
#if JUCE_USE_SOFTWARE_IMAGE_TYPE
    int defaultEngine = 0;
#else
    int defaultEngine = 1;
#endif
    return new HWNDComponentPeer { *this, styleFlags, (HWND) parentHWND, false, 0 };
}

Add JUCE_USE_SOFTWARE_IMAGE_TYPE to juce_Image.cpp

#if JUCE_LINUX || JUCE_BSD || JUCE_USE_SOFTWARE_IMAGE_TYPE
ImagePixelData::Ptr NativeImageType::create (Image::PixelFormat format, int width, int height, bool clearImage) const
{
    return new SoftwarePixelData (format, width, height, clearImage);
}
#endif

Add the same flag to juce_Direct2DImage_windows.cpp (note the ! operator).

#if !JUCE_USE_SOFTWARE_IMAGE_TYPE
ImagePixelData::Ptr NativeImageType::create (Image::PixelFormat format, int width, int height, bool clearImage) const
...
#endif

Setting JUCE_USE_SOFTWARE_IMAGE_TYPE=1 should then disable Direct2D.

Matt

2 Likes

What specifically is causing the path flushing?

Matt

Hey Matt! Thanks again for all your help. I was concerned disabling the D2D renderer for image contexts would be much more of a hassle, but that’s a super quick fix! We’ll implement that now and revert it when we have the time to address the D2D performance issues more thoroughly.

Regarding the path flushing, I couldn’t get a hold on what exactly was going on. I was just making inferences based on the symbol names in the stack trace. As I look more closely, it may well be that it’s just a function call named FlushInternal that’s just part of the D2D path renderers, and it’s not an actual context flush but just a slow D2D call in general. I’ll update my original post to reflect that.

Hey Matt -

In your first code sample, you set an int (defaultEngine) but then don’t use it.
I suspect the last parameter in new HWNDComponentPeer { … } should be that int ?

Also - I implemented this and it works (Thanks!).

Even rolling it back to using openGL and no D2D though, rendering is now significantly slower that it was (for me) with Juce 7 and Win 10.

With D2D disabled:
Works reasonably well (useable, though slow) in Release mode.
In Debug mode, multiple plugin windows will still immediately overwhelm the message manager thread. The rendering continues, but no mouse events register at all (and you end up needing to forcibly close the DAW).

Anyway, I look forward to understanding how I will need to refactor everything that uses an image or gradient (oof, this is gonna take some effort eh?).

Can you give me an example of how you’re creating your images?

Matt

Sure …

I have a number of use cases, but here is one I’m curious about (and suspect is highly problematic with Direct2D.

Imagine something like an oscilliscope or winamp style musical wave display.
Previously, the best practices (as I understand them) was something like this:

class myClass : public Component
{

        Image myImage;

        myClass()  : myImage(Image::ARGB, 500, 300, true)
        {}

        void paint(Graphics& g)
        {
               // fade out existing content on the image
               myImage.multiplyAllAlphas(.95);

               // create a graphic so we can paint paths or apply transforms to it
               Graphics ImageGraphic(myImage);

                // paint the current path (oscilloscope, fft bins, whatever)
                ImageGraphic.strokePath(somePath, PathStrokeType(1))

                // draw the image to the screen
                g,drawImageAt(myImage, 0, 0);
 
        }

};

So as I understand it, this is now problematic because creating the Graphics object creates a new object on the CPU, whereas previously it would stay on the GPU?

Anyway, it seems like something fundamental has changed, since this sort of thing seems much slower than it was before (particularly with Direct2D, but even with it disabled).

Also - same question for a situation where you have a ColourGradient instead of an image, and alter it each time paint is called.

As a daily reader of all new forum posts, I am regularly baffled by the absurdity of ditching the Juce 7 Windows rendering, which was working OK.

Not offering any Juce 7 LT support, and not offering developers a choice between a working and a new approach… where do these decisions come from?

The obvious macOS focus of the Juce developers? Management in the USA?
The paying users of Juce should be able to make such a choice in my opinion.

What “working OK” means will vary from project to project. In the last few JUCE user surveys, improved rendering performance was one of the most-requested features, and this is what D2D renderering was supposed to address. For many projects, D2D will be noticably smoother and more efficient (try switching between D2D and the software renderer in the GraphicsDemo of the DemoRunner to see an example). If you’ve been reading the forum regularly, you’ll also have seen feedback from users where the new renderer has produced significant performance improvements.

My impression is that the D2D renderer is faster in most situations. That’s why it’s the new default renderer. However, the performance characteristics are different from the software renderer, so code that’s been optimised for the old renderer may end up performing badly under the new renderer. This may mean that this code needs to be profiled and restructured a little to work well with D2D. Alternatively, the old renderer can still be easily enabled in places where this makes sense.

All JUCE users can choose the renderer they want to use, without modifying/forking JUCE. To use the software renderer instead of D2D to render into an image, you’d just add an extra parameter when constructing the Image:

// Create a software-backed image:
juce::Image image { juce::Image::ARGB, 512, 512, false, juce::SoftwareImageType{} };

{
    juce::Graphics g { image };
    // ...draw into g here
}

// ...store or read the content of image here

To use the software renderer instead of D2D when drawing directly to the screen, you can call ComponentPeer::setCurrentRenderingEngine (0), which will switch the peer’s renderer back to the software renderer.

The JUCE 7 renderer hasn’t been ‘ditched’. It’s still present in the codebase and available for use.

2 Likes

Thanks for the well thought out responses.

I will only say that depending on how big a refactor this is going to be for people like myself, I wouldn’t mind having the solution first posted here by matt permanently added to the JUCE tree. This way we could just set a JUCE_NO_DIRECT2D flag and be done with it.

That’s a good bit easier than going through all the places where I use an image and changing them … not to mention that this doesn’t solve gradient related slowdowns.

That said, I’m excited to get D2D working properly, once I wrap my head around how I am suppose to draw on images and use gradients in the new system. It was a big shock to realize that all my (previously quite fast) plugins were freezing the MM thread.

Wait … is this true? That probably explains why I see significant slowdowns even after switching back to the old engine eh? Or does Matts change take care of this?

By default, Image is backed by NativeImageType, which used to be the same as SoftwareImageType on Windows, but now has a new D2D implementation.

You can override this behaviour by passing your preferred image type into the Image constructor, as I showed above:

// This image is backed by software pixel data, not D2D
juce::Image image { juce::Image::ARGB, 512, 512, false, juce::SoftwareImageType{} };

I understand that. What I can’t tell is if the solution Matt proposed (above) changes that by default. I added the steps he suggested to the OP, and it helped considerably, but not completely, and I’m wondering if this is the reason.

More to the point - what is the best practice now for a paint routine (or timer) that needs to directly draw on an image? Probably just use the softwareType image eh? Or is there some other way to hold the image on the GPU and yet draw to it?

Also - is it advisable to get rid of the old openGL attached to component code? Is there any benefit to using open_gl now, or is D2D meant to supercede it?

My 2 cents - we need a way to distinguish between “permanent” and “disposable” images.

Permanent images keep a copy of their bitmap data in both CPU memory and GPU memory; that way, the GPU image can be restored in case the GPU restarts and the data in GPU memory is lost. JUCE software images are permanent. JUCE Direct2D images are also permanent, at the cost of needing to map the bitmap data back to the CPU from the GPU whenever the GPU rasterizes the bitmap. Mapping the bitmap data can take several milliseconds, which can add up.

However - some images don’t need to be backed up:

  • Cached component images
  • Effect output images
  • Any image that’s procedurally or dynamically generated

In those cases, the image will be recreated as needed so the back up is unnecessary and wastes both CPU time and memory. I’m calling these images “disposable”.

At the moment, I don’t think there’s a good way to specify if an image is permanent or disposable. I’m putting together some unofficial changes to address this as well as a few other issues. Please give me some time to work through this; I should have something soon.

Matt

5 Likes

Please see this thread:

https://forum.juce.com/t/image-permanence

Matt

Hi @aaronleese1-

I have a few recommendations for the code you posted.

Image::multiplyAllAlphas is currently implemented entirely in software. That means you’ll actually get 2 GPU->CPU mappings in a row for the same image - once for multiplyAllAlphas and again for the Graphics object.

Instead of multiplyAllAlphas, you could use two images to create the persistence effect.

Always wrap your Graphics object in curly braces; the actual drawing doesn’t happen until the Graphics object goes out of scope.

               // create a graphic so we can paint paths or apply transforms to it
               Graphics ImageGraphic(myImage);

                // paint the current path (oscilloscope, fft bins, whatever)
                ImageGraphic.strokePath(somePath, PathStrokeType(1))

                // draw the image to the screen
                g,drawImageAt(myImage, 0, 0);  <---- myImage may not be painted yet

Instead:

               {
                  Graphics ImageGraphic(myImage);

                   ImageGraphic.strokePath(somePath, PathStrokeType(1))
               }
 
                // draw the image to the screen
                g,drawImageAt(myImage, 0, 0); <---- myImage has definitely been painted

The example I posted here shows how to do the double buffering performance effect:

void paint(juce::Graphics& g) override
        {
            double elapsedSeconds = 0.0;

            {
                juce::ScopedTimeMeasurement stm{ elapsedSeconds };

                //
                // Use double buffering to create a persistence effect. Alternate drawing between outputImages[0] and outputImages[1].
                //
                // This avoids using multiplyAllAlphas, which will cause the Direct2D image to be mapped from the GPU -> CPU and cause a performance hit
                //
                auto& outputImage = outputImages[outputImageIndex];
                auto& previousOutputImage = outputImages[outputImageIndex ^ 1];

                //
                // Toggle the double buffer index
                //
                outputImageIndex ^= 1;

                //
                // Create images if necessary
                //
                if (polkaDotsImage.isNull() || polkaDotsImage.getBounds() != getLocalBounds())
                    paintPolkaDots();

                if (outputImage.isNull() || outputImage.getBounds() != polkaDotsImage.getBounds())
                    outputImage = juce::Image{ juce::Image::ARGB, polkaDotsImage.getWidth(), polkaDotsImage.getHeight(), true, juce::NativeImageType{}, imagePermanence };

                //
                // For each frame, paint the previous output image with slight transparency, then paint the polka dots image on top of that with full opacity
                //
                {
                    juce::Graphics imageGraphics{ outputImage };
                    imageGraphics.setColour(juce::Colours::transparentBlack);
                    imageGraphics.getInternalContext().fillRect(outputImage.getBounds(), true);

                    if (previousOutputImage.isValid())
                    {
                        imageGraphics.setOpacity(0.98f);
                        imageGraphics.drawImageAt(previousOutputImage, 0, 0);
                    }

                    imageGraphics.setOpacity(1.0f);
                    imageGraphics.drawImageTransformed(polkaDotsImage, juce::AffineTransform::rotation((float)angle.phase, 0.5f * (float)outputImage.getWidth(), 0.5f * (float)outputImage.getHeight()));
                }

                //
                // Draw the composited output image to the screen
                //
                g.drawImageAt(outputImage, 0, 0);
            }

Of course, this requires being able to mark the images as disposable.

Hope that helps-

Matt

Right, understood, and all good suggestions:

  • I am aware the rendering doesn’t occur until it goes out of scope, thanks.
  • elsewhere I use 2 images as you suggest, but wanted a simple example here

Let me see if I can get you to explain something else to me though.

Previous to D2D, my assumption was that components which contained images would keep them on the GPU, and that any Image:: operations (or Graphics operations) would operate on the GPU (I was using openGL of course). Is that accurate?

Now though, because of D2D I have to set the image type to SoftwareImage, which I believe keeps them on the CPU (so, slower rendering)?

Using your disposable / permanent distinction … am I then able to set the Image as disposable, and write directly to it (on the GPU) without triggering the GPU->CPU copying? Is that right?

If so … the fastest method would (maybe?) be:

  • set components to use openGL rendering
  • use your branch linked above
  • set images as disposable
  • draw on the images/graphics as usual, similar to above examples

Thanks for this tip, I was unaware of this functionality! Unfortunately, I think what often ends up being the problem is that the Images contained within StandardCachedComponentImage end up being the biggest offenders in terms of chewing up CPU time. Because of this, there’s no way for developers to change this behavior without modifying JUCE or using a custom CachedComponentImage override for every single component that they intend to buffer to image.

Though this is partially a user-side issue due to sloppy usage of setBufferedToImage, I think this hints at a larger issue with the D2D renderer, which is that it demands (in my view) a radically different programming paradigm in order to be used properly. Dealing with GPU-side resources is a dimension of thinking that hasn’t been required of JUCE developers until now (disregarding those who use OpenGL for large parts of their rendering), and as such I’d wager there’s a ton of existing code that would require huge refactors to avoid the newly-introduced pitfalls of the D2D renderer—that’s certainly the case for our team. The long and short of it is that I believe that it should be possible to disable the D2D renderer in all places by changing a single flag, whether that be a compile-time flag or a runtime call.

Beyond this, I second Matt’s notion of distinguishing permanent/disposable images. I think as JUCE evolves and collects even more GPU-based rendering engines, optimization semantics such as this will be indispensable.

EDIT: spelling.

1 Like

The JUCE OpenGL renderer doesn’t really take full advantage of the GPU; the software renderer still does a lot of work in even in OpenGL mode. For example (as far as I can tell), path rendering and drawing text is still handled by the software renderer. I think it’s more accurate to say that some Graphics operations operate on the GPU in OpenGL mode, but not all. See juce_OpenGLGraphicsContext.cpp for more detail.

The main benefit of Direct2D is that all the rasterization is done on the GPU.

Yep! But only in Direct2D mode and only if you explicitly set the image to be disposable.

Please note that with my branch (at the moment) only Direct2D images can be marked as disposable. The setting is ignored for all other image types.

Yes - except I’d change the first bullet point to “set components to use Direct2D rendering”.

Could you please try the example I posted on the other thread and post your results?

Matt

1 Like

As an experiment, the branch I just posted uses disposable images for StandardCachedComponentImage. Could you please try that and see if it improves matters?

Matt

1 Like