Direct2D Performance Update

Hi @ZioGuido

Thanks for posting that. In that specific case where you’re creating 100 drop shadow objects per frame the software renderer may well be faster.

You can improve matters by setting your PreviousFrame image to “no backup” mode. PreviousFrame is procedurally generated, so there’s no point in keeping a software backup of that image internally.

PreviousFrame.setBackupEnabled(false)

I also recommend you avoid creating a new bitmap every frame by calling createCopy. Instead, preallocate two images and then swap them.

I don’t think there’s been any particular effort to optimize DropShadow::drawForRectangle. I suppose that could use some effort. I changed your example to use DropShadow::drawForImage and easily hit 120 FPS.

Matt

Can you post your modified file?
When you use DropShadow::drawForImage you should create an empty image rather than a smiple rectangle, right?

If I understand correctly, the Direct2D renderer makes use of the GPU when drawing images, so they can be drawn very efficiently, as opposed to the software renderer or the Coregraphics renderer on macOS.

@ZioGuido A further optimization could be to create a single “smoke particle” image and repaint it as often as needed. You can even create the image using the visually better gaussian blur, as you will only be applying the blur once. (This method can of course be applied when painting regular drop shadows).

I have made a few tests based on your code and can get away with more than a thousand particles at maximum framerate. MacOS, on the other hand, struggles with just a few!

@matt Does setBackupEnabled(false) have any effect on MacOS? And have I implemented it, and the swapping of the images, correctly?

#pragma once
#include <JuceHeader.h> 

class GSiAnimatedAboutScreen : public Component
{
public:
	static constexpr int max_num_particles = 1500;

    GSiAnimatedAboutScreen()
    {
        setOpaque(true);
    }

    void Show()
    {
        Update();
        toFront(true);
        setVisible(true);
    }

    void Hide()
    {
        Desktop::getInstance().getAnimator().fadeOut(this, 250);
    }

    void Update()
    {
        for (auto& particle : particles)
        {
            particle.x += particle.dx;
            particle.y += particle.dy;
			particle.size += particle.dsize;

            if (particle.y < -particle.size)
            {
                particle.reset(getWidth(), getHeight());
            }
        }

        repaint();
    }

private:
    Image frameA;
    Image frameB;
    bool swap = false;
    Image smokeParticleImage;

    void resized() override
    {
        frameA = Image(Image::PixelFormat::RGB, getWidth(), getHeight(), true);
        frameB = Image(Image::PixelFormat::RGB, getWidth(), getHeight(), true);
        frameA.setBackupEnabled(false);
        frameB.setBackupEnabled(false);

        particles.clear();
        for (int i = 0; i < max_num_particles; ++i)
        {
            SmokeParticle newParticle;
            newParticle.reset(getWidth(), getHeight());
            particles.add(newParticle);
        }
        
        smokeParticleImage = Image(Image::PixelFormat::ARGB, 400, 400, true);
        
        {
            Graphics g(smokeParticleImage);
            g.fillAll(Colours::transparentBlack);
            g.setColour(Colours::white.withAlpha(0.4f));
            g.fillEllipse(smokeParticleImage.getBounds().reduced(180).toFloat());
        }
        
        smokeParticleImage.getPixelData()->applyGaussianBlurEffect (30.0f);
    }
 
    struct SmokeParticle
    {
        float x, y;
        float dx, dy;
		float size, dsize;

        void reset(int width, int height)
        {
            juce::Random rnd;
            x = rnd.nextFloat() * width - 100.0f;
            y = height;
            dx = (rnd.nextFloat() - 0.5f) * 2.0f;
            dy = -1.0f - rnd.nextFloat() * 3.0f;
			size = 30.f + rnd.nextFloat() * 500.f;
			dsize = 0.1f + rnd.nextFloat() * 0.1f;
        }
    };

    juce::Array<SmokeParticle> particles;

    void paint(Graphics& g2) override
    {
        Graphics g (swap ? frameB : frameA);
        g.fillAll (Colours::black);
        g.setOpacity (0.92f);
        g.drawImageAt (swap ? frameA : frameB, 0, 0);

        for (const auto& particle : particles)
            g.drawImage (smokeParticleImage, { particle.x, particle.y, particle.size, particle.size });
        
        g2.drawImageAt (swap ? frameB : frameA, 0, 0);
        swap = ! swap;
    }

    JUCE_DECLARE_NON_COPYABLE_WITH_LEAK_DETECTOR(GSiAnimatedAboutScreen)
};
1 Like

This is a great optimization of the animation effect, but this program’s goal was to test the efficiency of the DropShadow class by creating and modifying as more copies of shadows during an animation.

In a real application, I may have a number of components (knobs, buttons, etc.) with shadows behind them that are repainted whenever needed, and if the DropShadow class is slow, the whole repaint is slow. This is what happened when I tried to recompile one of my existing projects with Juce8.

Btw, your use of the GaussianBurEffect for creating a shadow could be a good solution for a new class that creates shadows in a more efficient way. I will experiment more on this.

1 Like

It does not.

It looks OK, yeah. FWIW I’d probably drop the swap member, and call std::swap (imageA, imageB) to swap the buffers around. Because Image is just a pointer to an ImagePixelData instance that actually manages the image resources, swapping Images like this is quick (it’s not copying image contents).

1 Like

This should be a real improvement.

Always put scoping braces around a graphics object being used to paint an image; the actual painting happens when the graphics object goes out of scope.

void paint(Graphics& g2) override
    {
        { // enclose Graphics object in scoping braces
           Graphics g (swap ? frameB : frameA);
           g.fillAll (Colours::black);
           g.setOpacity (0.92f);
           g.drawImageAt (swap ? frameA : frameB, 0, 0);

           for (const auto& particle : particles)
               g.drawImage (smokeParticleImage, { particle.x, particle.y, particle.size, particle.size });
        }
        
        g2.drawImageAt (swap ? frameB : frameA, 0, 0);
        swap = ! swap;
    }

Matt

2 Likes

DropShadow has several different methods, some of which take better advantage of GPU acceleration than others. drawForRectangle is doing quite a bit of gradient creation.’

For your particular use case, I’d skip using DropShadow entirely; I don’t think it’s meant for what you’re trying to do. Instead, try the new ImagePixelData::applyGaussianBlurEffect.

Matt

Still room for improvement here, but this should get the idea across. It’s a little visually different than your original.


#pragma once

#include <JuceHeader.h>

class GSiAnimatedAboutScreen : public Component
{
public:
	static constexpr int max_num_particles = 100;

    GSiAnimatedAboutScreen()
    {
        setOpaque(true);
        //setBufferedToImage(true);
    }

    void Show()
    {
        Update();
        toFront(true);
        setVisible(true);
    }

    void Hide()
    {
        Desktop::getInstance().getAnimator().fadeOut(this, 250);
    }

    void Update()
    {
        for (auto& particle : particles)
        {
            particle.x += particle.dx;
            particle.y += particle.dy;
			particle.size += particle.dsize;

            if (particle.y < -particle.size)
            {
                particle.reset(getWidth(), getHeight());
            }
        }

        repaint();
    }

private:
    Image previousFrame, frame, particleImage;

	void resized() override
	{
		previousFrame = Image(Image::PixelFormat::SingleChannel, getWidth(), getHeight(), true);
        frame = Image(Image::PixelFormat::SingleChannel, getWidth(), getHeight(), true);

        previousFrame.setBackupEnabled(false);
        frame.setBackupEnabled(false);

		particles.clear();
		for (int i = 0; i < max_num_particles; ++i)
		{
			SmokeParticle newParticle;
			newParticle.reset(getWidth(), getHeight());
			particles.add(std::move(newParticle));
		}
	}

    struct SmokeParticle
    {
        float x, y;
        float dx, dy;
		float size, dsize;

        void reset(int width, int height)
        {
            juce::Random rnd;
            x = rnd.nextFloat() * width;
            y = height;
            dx = (rnd.nextFloat() - 0.5f) * 2.0f;
            dy = -1.0f - rnd.nextFloat() * 3.0f;
			size = 30.f + rnd.nextFloat() * 10.f;
			dsize = 0.1f + rnd.nextFloat() * 0.1f;
        }
    };

    juce::Array<SmokeParticle> particles;

    void paint(Graphics& g) override
    {
        frame.clear(frame.getBounds());

        {
            juce::Graphics frameGraphics{ frame };

            frameGraphics.setOpacity(0.995f);
            frameGraphics.drawImageAt(previousFrame, 0, 0);

            for (auto& particle : particles)
            {
                if (particleImage.getWidth() < (int)particle.size * 2 || particleImage.getHeight() < (int)particle.size * 2)
                {
                    particleImage = Image{ Image::PixelFormat::SingleChannel, (int)particle.size * 2, (int)particle.size * 2, false };
                    particleImage.setBackupEnabled(false);
                }

                particleImage.clear(particleImage.getBounds());

                {
                    juce::Graphics particleGraphics{ particleImage };
                    particleGraphics.setColour(juce::Colour{ 0xa0808080 });
                    particleGraphics.fillEllipse(particleImage.getBounds().toFloat().withSizeKeepingCentre(particle.size, particle.size));
                }

                particleImage.getPixelData()->applyGaussianBlurEffect(particle.size);

                frameGraphics.drawImageTransformed(particleImage, juce::AffineTransform::translation(particle.x - particle.size, particle.y - particle.size));
            }
        }

        g.drawImageAt(frame, 0, 0);

        std::swap(previousFrame, frame);
    }

    JUCE_DECLARE_NON_COPYABLE_WITH_LEAK_DETECTOR(GSiAnimatedAboutScreen)
};

If you really want to make it fast, pre-render the particle shadow to an image and then just paint that image many times and have the GPU resize it.

Matt

Create a single particle image and blur it in your resized handler():

	void resized() override
	{
		previousFrame = Image(Image::PixelFormat::SingleChannel, getWidth(), getHeight(), true);
        frame = Image(Image::PixelFormat::SingleChannel, getWidth(), getHeight(), true);
        particleImage = Image(Image::PixelFormat::SingleChannel, 32, 32, true);

        {
            Graphics particleGraphics{ particleImage };
            particleGraphics.setColour(juce::Colour{ 0x40808080 });
            particleGraphics.fillEllipse(particleImage.getBounds().toFloat().withSizeKeepingCentre(particleImage.getWidth() * 0.5f, particleImage.getHeight() * 0.5f));
        }

        particleImage.getPixelData()->applyGaussianBlurEffect(particleImage.getWidth() * 0.5f);

Then just paint a lot of images:

    void paint(Graphics& g) override
    {
        frame.clear(frame.getBounds());

        {
            juce::Graphics frameGraphics{ frame };

            frameGraphics.setOpacity(0.995f);
            frameGraphics.drawImageAt(previousFrame, 0, 0);
            frameGraphics.setImageResamplingQuality(Graphics::highResamplingQuality);

            for (auto& particle : particles)
            {
                auto transform = juce::AffineTransform::scale(1.0f * particle.size / particleImage.getWidth(), 1.0f * particle.size / particleImage.getHeight())
                    .translated(particle.x - particle.size * 0.5f, particle.y - particle.size * 0.5f);
                frameGraphics.drawImageTransformed(particleImage, transform);
            }
        }

        g.drawImageAt(frame, 0, 0);

        std::swap(previousFrame, frame);
    }

120 FPS in Direct2D mode, with 1000 particles instead of 100.

I wouldn’t recommend that approach using the software renderer though. You’d burn a lot of CPU resizing images.

Hopefully at some point in the future this sort of thing will all be shader effects.

Matt

Hi all. For those who are seeing performance issues, could I ask you to check if you’re using an Nvidia GPU and if so, do you have any frame rate or background frame rate limits set in the driver?

I may have discovered something weird - on my machine, anyway. Full write-up here: Possible interaction: direct2d + Nvidia frame-rate limit -> bad slowdowns? (deep dive)

1 Like

RTX4060 here with no limits set.