Performance improvement for drawing complex paths


#1

Drawing complex paths in JUCE is slow, and the profiler showed that over 90% of the time is spend inside EdgeTable::remapTableForNumEdges.

How to fix:

void EdgeTable::addEdgePoint (const int x, const int y, const int winding)
{
    jassert (y >= 0 && y < bounds.getHeight());

    int* line = table + lineStrideElements * y;
    const int numPoints = line[0];

    if (numPoints >= maxEdgesPerLine)
    {
        // O(n²) bytes allocated and copied in total!
        remapTableForNumEdges (maxEdgesPerLine + juce_edgeTableDefaultEdgesPerLine);
        jassert (numPoints < maxEdgesPerLine);
        line = table + lineStrideElements * y;
    }

    line[0]++;
    int n = numPoints << 1;
    line [n + 1] = x;
    line [n + 2] = winding;
}

Always use exponentional increments when dynamically growing a container:

    // O(n) bytes allocated and copied in total.
    remapTableForNumEdges (maxEdgesPerLine * 3 / 2);

One particular paint with a huge amount of lines got about 10 times faster after that change.


#2

Good find! Here’s hoping this gets pushed to the JUCE repo.


#3

The same thing happens in addEdgePointPair(…) as well. If I understand correctly the problem occurs if a path has lots of intersections with horizontal lines. It gets slow quickly because remapTableForNumEdges always copies the whole table.

I think I’ve come across that a lot when trying to draw antialiased waveforms. I noticed that things get very slow when a wide spiky waveform path is drawn and thus split that into many stripes.


#4

I did run some tests as this seems to affect my drawing performance a lot. Of course this is mainly a problem on Windows as the CoreGraphics renderer doesn’t go through that code. The complex paths I draw are sample overviews and frequency plots and both are affected and seem to be a common thing to draw with Juce. When I draw sample waveforms, the edge count per line goes over 300 sometimes. With the current implementation that means reallocating and copying the table about 10 times.

The default value is 32 edges per line

const int juce_edgeTableDefaultEdgesPerLine = 32;

and then the count gets increased by 32 everytime the max is surpassed.

In my humble opinion paths to draw fall into two categories. Simple ones where 32 is enough and complex ones that can reach quite high numbers, so at least for me the best approach would be to increase the edge count a lot in case the limit is reached. So I changed my local copy to quadruple the count once the limit is breached.

roeland, why did you use *3/2 ? I would at least use *2.


#5

That was kind of arbitrary. We also ended up using *2.


#6

This could be improved further by using the path height and the number of path segments to guess an appropriate initial number of edges per line.


#7

FYI I’ve pushed a change now that should improve things. I went for the 3/2 growth, as space is probably a bigger factor for most people than speed


#8

Jules, thanks a lot for the fix! Too bad you didn’t go for x2 like most standard containers. It’s the most sensible choice and would even make your code cleaner as I saw you prevented the first step from becoming just 16 by an extra jmax.

Using a bit more space is certainly better than copying things around in memory many times and it’s not that much space that is used anyway. Since you wrote the original code, memory and cache sizes have grown a lot.

In my experiments I saw sizes up to ~500 edges per row (drawing sample waveforms using a path). With x1.5 this leads to 7 reallocation and copy steps while x2 would only require 4. (the original code would use 15). So for complicated paths, the factor makes a huge difference as they also require reallocating and copying larger chunks of memory while for easy paths it makes most likely no difference at all.


#9

OK, I could be persuaded to change it, I was only really going on a gut feeling about what was the best size.


#10

I got it up to 3000 once when drawing a particularly complex graphic on a large resolution monitor. That’s more than the usual case, in our case the complexity depends on user input, so it can happen.

Another thing I noticed: it’s faster to add a whole bunch of lines to one Path and draw it, than to draw a whole bunch of lines one by one as separate paths. Is that expected or are we maybe doing something silly?

Out of interest, since there is this number of edges per row, would it be faster to draw wave forms sideways? There’s potentially a lot of intersections with any given horizontal line, but always exactly one with any given vertical line.


#11

Have you tried my latest changes from yesterday? I made it try to predict a good starting size for the array.


#12

Jules, thanks a lot for another update and sorry about the nagging.

roeland, while these changes help, drawing paths with tons of row intersections will still be slowish. Drawing 90 degrees rotated would certainly be faster, but rotating the resulting bitmap back might kill all the gains.

What I did before the latest changes was slice my path into vertical strips, each of which was 32 pixels wide. That speeded up drawing a lot, but lead to blending artefacts with display scaling sometimes.

If you have 3000 intersections in a row, you might also want to think about thinning the data before the drawing. In my case (drawing sample overviews) I create multiple versions of the data at different resolutions and use one that’s only a bit above the pixel resolution when drawing. I think speed-wise it’s also important to avoid creating paths that are larger (wider) than the screen area and then clip them. It’s much faster to avoid clipping by only creating what’s going to be visible.