Direct2D

I’m glad to see it’s not just me!

This was excruciating, but I think I have a good solution for window drag-resizing using Direct2D swap chains.

TL;DR: don’t render Direct2D directly to a desktop window. Instead, create two child windows (A & B), then add them to the original window and make the children the same size as the original window. Assign a flip mode swap chain to child A and show child A. Assign a blt mode swap chain to child B and hide child B. Render everything within child A.

If the user starts resizing, hide child A and show child B. Render everything with child B. Resize both child A and B on the fly to match the size of the parent window. When the user stops resizing, hide child A and show child B.

This works great!

This also should fix an issue with switching from the JUCE Direct2D renderer back to the software renderer; just hide the child windows and the software renderer can paint to the parent window as before.

So far it’s just a proof-of-concept with pure Win32 code; I haven’t added this method to the JUCE Direct2D renderer, but it should work fine.

Excessively detailed and cathartic explanation

Resizing windows on Windows is awful. On my Windows 10 machine with a good graphics card and fast processor, resizing applications like Firefox, Steam, Visual Studio, or even the command prompt causes clearly visible flickering or sizing errors around the right and bottom edges of the window. It’s endemic to multiple applications; I think it’s something deeper within the bowels of the Desktop Window Manager. Here’s a very long thread detailing all the issues:
https://stackoverflow.com/questions/53000291/how-to-smooth-ugly-jitter-flicker-jumping-when-resizing-windows-especially-drag

Resizing using a flip mode swap chain is especially bad; the image flickers like crazy (this may worse because I’m using an NVIDIA G-SYNC compatible monitor?). Resizing a blt mode swap chain works fine, but the performance is much better with a flip mode swap chain.

I looked over the hybrid solution posted by jrlanglois, but that solution was a combination of Direct2D 1.0 and Direct2D 1.1. I thought I’d instead try instead using a flip mode swap chain normally, and switch to blt mode only while resizing.

But that didn’t work. You can only have one swap chain associated with a window region at a time, so I tried freeing the flip mode chain and then creating the blt mode chain. But - the flip mode swap chain sort of sticks to the window and keeps rendering the last presented buffer, even if you delete the it and make a new blt chain. I suspect this has something to do with the deferred destruction issues mentioned here:
https://docs.microsoft.com/en-us/windows/win32/api/d3d11/nf-d3d11-id3d11devicecontext-flush

The “sticking” also means you can’t freely switch the JUCE renderer between software mode and Direct2D mode with a flip mode chain; you still see the last presented buffer from the swap chain.

Using two child windows means that you can keep a separate swap chain for each child window and freely switch between the child windows using WM_ENTERSIZEMOVE and WM_EXITSIZEMOVE. Doing so means we get the performance benefits of flip mode and the smooth resizing of blt mode. Best of both worlds.

There may be a simpler approach using DirectComposition and WS_EX_NOREDIRECTIONBITMAP, but I haven’t been able to get that to work at all. I don’t think using WS_EX_NOREDIRECTIONBITMAP would work with the software renderer in any case, so it would make it tough to switch renderers on the fly.

If you’re read this far, then I congratulate you on your excess of spare time. Thanks for reading.

Matt

1 Like

Good on you for figuring that out. What a drama!

If the flip-mode swapchain is in its own child window - could one choose to not resize it along with the main window, but instead leave it as-is during the resize then resize the child window just once when the resizing drag is over? i.e. it would not resize when dragging but ‘jump’ to the correct size afterward. Would that mitigate the ‘flashing’?
I know this would not look as nice, but the idea of allocating an entire second swap-mode window just for use during resizing seems a little wasteful of RAM.

Hi Jeff-

Thanks for the suggestion. I’ll give that a try. I suspect that if I’m enlarging a window, I’ll end up with the original content in the upper-left corner and then large blank areas along the right and bottom edges.

But - we might also end up just displaying uninitialized video memory and showing weird artifacts.

I also don’t love the idea of just keeping the second window around when it’s not being used most of the time. I’m going to try dynamically allocating the resizing child window as needed and see how that works.

Matt

Dynamically allocating the resizing child window works fine.

Matt

I did consider making the child window a little bigger along the right and bottom edges than it needs to be, e.g. 1cm. So that resizing the parents merely exposes more of the child, without artifacts. (one would need to resize the child, but not so often, more like 1 time in 20 which would reduce the flickering).

Oh, I see. I had a similar thought early on where I tried making the swap chain buffer twice the size of the actual window, but it didn’t seem to help much.

Matt

I’ve implemented the double child window method in the JUCE Direct2D renderer. Works pretty well; if you really beat on it, you can occasionally see some flicker.

The second child window is dynamically allocated and only used while resizing.

Matt

1 Like

Tried to compile it and got following error on VS2022:

juce_win32_Direct2DGraphicsContext.cpp(212,66): error C4596: ‘clipToPath’: illegal qualified name in member declaration

Sounds like you are trying to compile the master branch. Make sure you’re on the branch named “direct2d”.

https://github.com/mattgonzalez/JUCE/tree/direct2d

Matt

1 Like

That worked. So, I built the projucer and created a test project. The test project compiles and runs as expected. But, when I defined JUCE_DIRECT2D=1 and call getPeer()->setCurrentRenderingEngine(1); within the MainWindow constructor, I get an empty window. Why is that? What am I missing? (this is a standard hello world project generated using projucer. Win10 21H2 19044)

OK - what happens if you call repaint(), if you resize the window, or wave your mouse over the window?

What graphics adapter are you using?

Matt

Hi @matt

The text rendering looks really great with your fork, impressive!
However as you know, the rendering speed is not usable…

Do you have plans to continue your DirectD investigations?

Cheers,

Peter

Hi Peter-

Yes, I need to get back to Direct2D.

What are the specific issues you are seeing with rendering speed?

Matt

On Windows it is sooo easy to spot the plugins that are developed with Juce… :slight_smile:

I hope your work can put an end to that silly and many years old issue.

The strokePath() calls in my Eq spectrum code take approx. 2x longer with your fork than with the current Juce develop (release mode). I could live with that by tweaking my use of the rounded corners option (at 6.0f for the entire path at the moment).
I am already using “log compressed” data, for those interested, so there is not much to gain in that area :wink:

Text rendering is really great with your fork (there is a small difference in text size, curious about that).

My previous comment on rendering speeds needs correction :slight_smile:

I thought I ran into issues with gradient fills, but redoing my measurements I see that I left OpenGL on in a project with 10 level meters that use gradient fills (the non-Direct2D condition). Not fair, of course.
In a straight comparison with/without Direct2D I see no real difference with gradient fills.

Thanks for your work!

Peter

Hi Peter-

I changed my test app to stroke a path with something like 7,000 points:

The app runs a 16 msec timer and scrolls the sine wave side to side.

void TestComponent::RenderComponent::testDensePathStroke(juce::Graphics& g)
{
    double elapsed;

    {
        juce::ScopedTimeMeasurement stm{ elapsed };

	    float constexpr numCycles = 4.0f;
	    float radiansPerPixel = juce::MathConstants<float>::twoPi * numCycles / (float)getWidth();
	
	    float centerY = getHeight() * 0.25f;
	    float angle = startAngle + radiansPerPixel;
	    float yScale = getHeight() * 0.1f;
	    juce::Path p;
	    p.startNewSubPath(0.0f, centerY);
	    float x = 1.0f;
	    for (; x < (float)getWidth(); x += 1.0f)
	    {
	        float y = yScale * std::sin(angle) + centerY;
	        p.lineTo(x, y);
	        angle += radiansPerPixel;
	    }
	
	    centerY = getHeight() * 0.75f;
	    p.lineTo(x, centerY);
	
	    angle -= radiansPerPixel;
	    for (; x >= 0.0f; x -= 1.0f)
	    {
	        float y = yScale * std::sin(angle) + centerY;
	        p.lineTo(x, y);
	        angle -= radiansPerPixel;
	    }
	
	    p.closeSubPath();
	
	    g.setColour(juce::Colours::magenta);
	    g.strokePath(p, { 4.0f, juce::PathStrokeType::JointStyle::curved });
    }

    g.setColour(juce::Colours::white);
    minTime = juce::jmin(minTime, elapsed);
    maxTime = juce::jmax(maxTime, elapsed);
    averageTimeAccumulator += elapsed;
    ++paintCount;
    auto average = averageTimeAccumulator / (double)paintCount;
    g.drawText(juce::String{ minTime * 1000.0, 1 } + " / " + juce::String{ average * 1000.0, 1 } + " / " + juce::String{ maxTime * 1000.0, 1 }, 5, 5, 200, 30, juce::Justification::centredLeft);
}

On my machine with a 3440x1400 window, calling strokePath takes an average of 1.4 msec with a worst case of 10.1 msec.

With the software renderer, strokePath took an average of 3.7 msec, with a worst case of 9.2 msec.

This is an Intel Core i9-10900 with an Nvidia 2080Ti.

Matt

Peter, what is your graphics card?

How many elements are in your path?

Graphics::strokePath does a lot of work before calling the renderer; it creates a new temporary path that is the outline of the original path and then tells the renderer to fill the temporary path.

void Graphics::strokePath (const Path& path,
                           const PathStrokeType& strokeType,
                           const AffineTransform& transform) const
{
    Path stroke;
    strokeType.createStrokedPath (stroke, path, transform, context.getPhysicalPixelScaleFactor());
    fillPath (stroke);
}

So there’s only so much the hardware renderer can do to speed things up; my profiler shows that most of the CPU time is spent in PathStrokeHelpers::createStroke.

Matt

Looks like Direct2D can stroke a path directly, so it should be possible to skip the intermediate path and just have the hardware do it. But that goes beyond fixing the Direct2D renderer; that would require changing the Graphics class itself.

Matt

Hi Matt,

I have spent a few hours with testing; it’s always tricky to make sure that stuff like caching doesn’t skew the results.

On Windows I am using a modest Lenovo laptop, as I want to “feel” how my plugins work (or drain the CPU :smiley: ). The adapter is an Intel UHD Graphics 620.
I have a much faster studio PC DAW with a 4K screen, but it is currently not operational. And I am not really interested in performance on high end systems, as weird as that may sound.

Today I saw that the difference of Direct2D vs None is between 2:1 to 1.2:1, smaller differences than what I saw yesterday.

I use a pattern like this in release mode:

timer.start ();
g.strokePath (levelPaths[ch], levelStrokeType);
timer.stop ();

which uses a high res timer and reports the average in the destructor, like (plugin 1):

Average time in msecs for EQ spectrum path call - Direct2D: 0.102332054 (1407 x)
Average time in msecs for EQ spectrum path call - No Direct2D: 0.191591450 (1345 x)

and (plugin 2)

Average time in msecs for EQ spectrum path call - Direct2D: 0.135503316 (935 x)
Average time in msecs for EQ spectrum path call - No Direct2D: 0.142206404 (1140 x)

These paths have 93 points and use rounded curves of 6.0f. I guess we both seem to have similar results (no more need to send you an example like you asked me?).

So Direct2D is costing a bit more processing, but like you hinted, Juce graphics may be optimized for Direct2D support… so maybe this difference can easily be ironed out, should the Juce team consider this a worthwhile feature.

Hi Peter-

Thanks for the additional data. I’ll look into implementing hardware path stroking.

Matt