Optimizing Juce LowLevelSoftwareRenderer

TheVinn · January 19, 2011, 2:17pm

Jules, I’ve reached the point where the drawing speed is slow. I have most of my controls implemented. Resizing the window is painfully slow and I get buffer underruns in the audio thread when my window is maximized.

My question, if I hire out a company that specializes in optimizations and send Juce over to them to speed up LowLevelSoftwareRenderer (this will include SSE and any other tricks I can squeeze out of each platform) are you open to picking up the changes? Of course I will pay for it.

otristan · January 19, 2011, 2:50pm

If I remember correctly Jules is working on a Direct2D version of Juce

TheVinn · January 19, 2011, 2:59pm

My intuition tells me that is not the way to go. I mean don’t get me wrong, its good to have as another choice. But I think that the LowLevelSoftwareRenderer can be improved dramatically.

First of all, in my old home-brew framework (which I modeled closely after Juce) I was rendering the same controls several times faster than what I’m getting from Juce. So I know for sure that the Juce renderer can be improved.

Second, I’m interested in repainting components using multiple threads in parallel. Specifically, for rendering a given rectangle, divide the rectangle into N horizontal bands and draw them in parallel using a thread pool.

Third, I want to fully exploit processor extensions such as SIMD / SSE.

It will always be possible to get better optimizations using problem-specific information (i.e. type of things being rendered) rather than a general approach - Direct2D is a general approach.

TheVinn · January 19, 2011, 3:01pm

The Visual Studio 2008 profile is completely broken for me under both Windows 7 32-bit and Windows 7 64-bit so I am in the processs of trying to get Intel C++ Composer / VTune / Parallel Studio up and running so I can have some concrete results.

I will publish any optimized subclasses of Graphics / etc… under the MIT license so everyone can benefit.

I just would like a commitment from Jules that if I spend the money, he will adjust the API for the Graphics related classes for me so I can drop in optimized replacements without patching Juce.

jules · January 19, 2011, 5:17pm

I couldn’t disagree more. The future is not going to involve much software rendering, it’s going to all be done with GPUs, and the only way to take advantage of that is with OS-specific rendering engines like CoreGraphics, Direct2D, openGL, etc.

A software renderer is great as a fallback, but the one I’ve got at the moment has exactly the attributes that I want: it’s portable, elegant, maintainable, and fast enough. I’ve no interest whatsoever in bloating it out with reams of unintelligible assembly or intrinsics just to gain a few percentage points in speed.

I’m not giving any commitments! But the rendering platform is already completely virtualised - new engines can be plugged in without affecting any existing code, so there should be nothing to stop you writing a new engine in parallel to what’s already there, and implementing it in whatever way you want.

TheVinn · January 19, 2011, 5:24pm

Well ComponentPeer doesn’t have a way to override which low level renderer it uses… and there isn’t enough of the implementation exposed in order to subclass it without duplicating everything.

Preliminary results from the VTune profiler are showing that vertical gradients are consuming most of the runtime.

TheVinn · January 19, 2011, 5:34pm

Example:

    void Win32ComponentPeer::handlePaintMessage()
    {
        //....
                LowLevelGraphicsSoftwareRenderer context (offscreenImage, -x, -y, contextClip);
                handlePaint (context);

I would like context to come from a virtual function call (i.e. createContextForPaint() or something) that I could replace. Although it’s not obvious how to do that since subclassing the Win32ComponentPeer is not an option.

Perhaps something like

LowLevelGraphicsSoftwareRenderer* LookAndFeel::createRendererForComponentPeer (ComponentPeer* peer);

In order for this to be useful, LowLevelGraphicsSoftwareRenderer implementation would need to be exposed (thinking of the stuff in namespace SoftwareRendererClasses and LowLevelGraphicsSoftwareRenderer::SavedState where most of the work is done), so a subclass can customize just a little bit of it instead of having to replace the entire implementation.

In my case I specifically want to address vertical gradients, and just those (I think). It would be nice if I could do this without changing Juce and yet handle all the clipping cases (no clip, RectangleList clip, EdgeTable clip, Image Alpha clip), while being able to fall back on Juce implementation for the cases I don’t care about.

TheVinn · January 19, 2011, 5:37pm

The other thing is to divide the area requiring update into N horizontal rectangles and paint them in parallel using an individual LowLevelGraphicsContext for each one. Obviously there are some locking issues with that (cached glyphs come to mind). I wish there was enough virtual function / access qualifiers / customization in Juce to let me do this, entirely in my client code of course since I know you don’t want that in the library.

TheVinn · January 19, 2011, 5:39pm

Yeah I agree fully, that’s why the ideal solution is one where I can subclass / override Juce behavior with my own external files but still leverage most of the existing LowLevelGraphicsSoftwareRenderer for the parts that I don’t need to optimize. Right now you have to replace the WHOLE thing and there’s no hook for doing that in the ComponentPeer.

TheVinn · January 19, 2011, 10:33pm

Very noticable performance increase in LowLevelSoftwareRenderer from just recompiling with the Intel C++ Compiler XE, for redrawing my entire window during a resize operation.

TheVinn · January 21, 2011, 6:28pm

Still having gradient fills take up 35% of the runtime:

[attachment=0]vtune.png[/attachment]

I optimized everything I could, cut down on drawing, plus use of setOpaque, setPaintingIsUnclipped. This is the best I could do. For comparison, note that mp3 decoding (of 4 simultaneous streams) only took up 10%, less than a third of drawing. And the juce Resampler didn’t even make it into the profile thats how fast it is.

robiwan · January 22, 2011, 11:15am

You could always use antigrain to see if it is fast enough for your needs. I once did this through adding a function in Graphics (to get the destination image), then attaching AGG to that bitmap and draw with AGG.

Edit: This would be appropriate http://code.google.com/p/graphin/

TheVinn · January 22, 2011, 12:53pm

I doubt agg is going to be that much faster, the structure of the rendering code is similar to Juce but I will check it out.

I want to optimize specifically for my cases of drawing, which is vertical blends, and also alpha blending with the color black - I have routines that are hard-coded to blend only black into the destination as i make heavy use of transparent black frames, drop shadows, and what not.

zamrate · April 29, 2011, 10:08am

I’d be highly interested in hearing your results with agg, let us know them!

Christof_Schardt · April 29, 2011, 12:37pm

Vinn, are you aware of this?
http://code.google.com/p/fog/

Topic		Replies	Views
LowLevelGraphicsSoftwareRenderer General JUCE discussion	16	879	December 6, 2010
Exchanging the font & software render General JUCE discussion	72	1597	June 28, 2012
LowLevelGraphicsSoftwareRenderer: SharedImage access General JUCE discussion	18	625	June 1, 2011
Graphics rendering performance and optimization General JUCE discussion	8	2199	February 19, 2014
Drawing speed General JUCE discussion	13	763	May 12, 2017

Optimizing Juce LowLevelSoftwareRenderer

Purchase

Discover

Learn

Support

About

Events

Optimizing Juce LowLevelSoftwareRenderer

Related topics

Purchase

Discover

Learn

Support

About

Events