What's more expensive: filling a circular path or using fillEllipse?

If you were going to draw a ton of little 3x3 px filled circles and flash them, would there be any advantage to using fillEllipse() vs. filllPath() (using a 3x3 pre-allocated circle path), or vice versa?

In other words, is it more expensive to repeatedly fill a path vs. filling a similar shaped graphics region? I’m just talking about primitive shapes here like circles and squares.

It actually seems to use slightly less CPU with fillPath() (on macOS at least) - which is curious to me…

Are you filling the area with a regular pattern?

Matt

@matt - it’s just a 3x3 circle filled with a solid color - but many of them flashing rapidly.

As always it’s going to depend and requires profiling. I did some optimisations for fillEllipse back in August (macOS: Improve performance for some graphics draw functions · juce-framework/JUCE@997c927 · GitHub). After that commit, on macOS I would probably expect fillEllipse() to be faster, as it can skip the JUCE path altogether. However, if you are going to fill all 9 with the same colour you could build up a single pre-allocated path with all 9 and do a single fillPath() call which may well be faster than 9 fillEllipse() calls. Different renderers may vary.

In general I found the fewer fill calls you could do, the better, but try to build up a path that meant each call does more filling, if that makes sense.

Are you working from a version of JUCE that is before or after the commit I linked above?

I’d also consider rendering the path to an Image or multiple Images and then repeatedly painting that Image to your window as needed.

Matt


I guess I should explain in more detail. Imagine a row of 128 3x3 pixel components that can be individually flashed to indicate the progress of an index through a pattern of steps.

So each of them has a paint() function that draws a 3x3 pixel filled circle in one of two solid colours (on or off). It seems that using fillPath() in the paint function on a member variable path is slightly more efficient, but I could be wrong.

Telling one of the components to change from off to on (or flash) does not update any of the others. They are individually addressed.

The thing is, you can advance through this sequentially quite quickly flashing them like LEDs and it takes some CPU when you have multiple chains of these tiny leds blinking. So I was just trying to optimize where possible. But don’t think making a tiny image for each “led” would improve things.

Using 8.0.4.

Thanks, slightly surprised if calling fillEllipse() repeatably is worse than calling fillPath() repeatably. Without looking closer I think the work required should be very similar.

Assuming only one can be on at a time, and it’s not a complete pain, an interesting experiment would be to try creating a single path with ALL the ellipses and fill that single path with the off colour. Then you only have the overhead of drawing one extra path/ellipse for the on LED. this technique would require the whole thing is one component though where-as based on what you’ve said I suspect you have lots of small components.

Another technique you could consider is to create a parent component to hold an array of LED sub components. The parent component can have two member variables, a pre-rendered image for both the on and off states of an individual LED.

Each subcomponent LED should take a reference to the parent compound in the constructor. In the paint call they can then access the relevant image and paint it directly. Also consider setting setPaintingIsUnclipped(true) on the sub component.

Naturally, for resizeable LEDs you’d need to handle re-rendering of the LEDs in the parent in its resize call.

Thanks - would this (pseudo-code) actually be any faster:

void paint(Graphics& g)
{
    if (on)
        g.drawImage (parent.imageOn, bounds);
    else
        g.drawImage (parent.imageOff, bounds); 
}

…versus this:

void paint(Graphics& g)
{
    if (on)
        g.setColour(onColour);
    else
        g.setColour(offColour);
    g.fillEllipse(bounds);

}

…when you’re talking about a 3 x 3 pixel area? I suppose I could try it… it’s a bit of work. :wink:

Yep, already doing that, thanks.

How many distinct colors do you have?

@matt - 2 colors; just on and off.

On Windows on my machine (Direct2D mode):

drawImageAt: 2% CPU, 3% GPU
fillEllipse: 3.3% CPU, 5.5% GPU
fillPath: 4% CPU, 7% GPU

I’ll try the Mac as well.

Matt

1 Like

@matt - thanks so much for the Windows numbers. I’m surprised that drawing the image is cheaper - but it’s great to know. I will definitely have to try that. Interested to know what the measurements you get on the Mac are…

I wonder if the image is cheap because it’s small the ellipse might scale better with size maybe?

I believe the difference is due to tessellation (aka the setup cost).

For fillPath, the Direct2D renderer needs to do the following:

  • Iterate through the Path and convert it to a D2D Geometry
  • Pass the Geometry to D2D
  • D2D will tessellate the Geometry to a triangle mesh (convert it to a triangle mesh). Since D2D doesn’t know anything about the shape of the geometry, it has to perform a general-purpose tessellation, which is slower.
  • D2D then rasterizes the tessellated mesh.

For fillEllipse, D2D still needs to create a triangle mesh, but it it knows that the shape is an ellipse, which is quick to tessellate. It still needs to allocate memory internally, etc, but it can optimize for a known shape.

drawImageAt is the simplest case; it’s just painting to the two triangles that make a rectangle. If the Image data is already stored in the GPU, then the renderer is just issuing a buffer copy command. Essentially zero CPU time.

As far as I can tell, CoreGraphics on the Mac works similarly.

Matt

A bit off topic. Do you know whether the cost of the buffer copy command scales with the size of the image? For example, if it is a big ellipse, I would assume that:

  • g.fillEllipse takes some CPU time, but is faster on GPU
  • g.drawImageAt takes zero CPU time (if the image is on GPU), but is slower on GPU as it has to copy a large amount of pixels

Am I correct?

BTW, there was an API to cache juce::Path object on the GPU on Direct2D branch (but then got removed IIRC). Is there a way to cache a path now? If a path is cached, does it also become faster in this case (i.e., the same path gets painted repeatedly)?

Tessellating a larger ellipse requires more triangles, so that’s more CPU time. Then the fragment shader still needs to fill in the triangle, so more pixels == more GPU time.

Drawing a larger image will take the same amount of CPU time as a smaller image and will take more GPU resources.

In practice I don’t think it matters. You’ll have to try fairly hard to stress the GPU, though. Think about the sheer number of polygons and render passes involved in a single frame for a game.

We couldn’t find a good way to fit that into the existing API without causing problems. I’d still like to get that back in the mix.

Matt

1 Like

This is an excellent suggestion; rendering lots of sibling components can be slow since the JUCE component layer goes through and clips off each sibling before painting. That can really bog things down.

Matt

Thanks for your valuable insight. I will revisit some of my code very soon.