Juce 2d graphics vs nanovg (performance)


#1

I’ve been using nanovg to render my app’s main UI, with juce just setting up an OpenGL context. Nanovg’s performance is pretty good. Here’s what it does:

  • data for all draw calls is added to a single OpenGL VBO each frame (nanovg is immediate mode like juce::Graphics)
  • a single shader is bound which handles all drawing (including analytic AA, so you don’t need MSAA), so there are no shader program changes during rendering
  • draw calls are made, coalescing when possible

I’m wondering if JUCE’s 2D rendering could compete with this approach from a performance standpoint? I’m considering moving more code over to JUCE. Also an option would be to write a LowLevelGraphicsContext which renders using nanovg.


#2

Interesting idea about having a single shader - does it just have a huge ‘if’ statement in it that decides what type of fill to use?

TBH having profiled the juce GL renderer, changing shaders never even appeared on the list of hotspots, so is unlikely to be worth changing. The place where it burns almost all of its time is in feeding the GPU with rectangles, so if I was to optimise it, what I’d concentrate on would be implementing something like a Loop-Blinn algorithm.


#3

Sorry to hijack this thread but it had lead me to a number of questions.
If I understand correctly NanoVG uses the GPU / openGL to rasterize paths while Juce uses CPU based rasterization of paths which then are fed to the GPU as bitmaps. Is this correct?
If I’m right then the Juce OpenGL renderer is not a full renderer, but more or less just a more direct way to bring software-rasterized bitmaps to the screen. Please correctly me (harshly) if I’m wrong :slight_smile:

Loop-Blinn is an algorithm to draw paths using the GPU, right?

I’d love to have GPU based path rendering in Juce.


#4

No, that’s not quite right. It does use the CPU to flatten the path to an edge-table, but it then passes the edge-table to the GPU for compositing and shading. There are no bitmaps involved in path rendering.


#5

It does have an if statement, though not huge. Here’s the entire fragment shader:

#ifdef GL_ES
#if defined(GL_FRAGMENT_PRECISION_HIGH) || defined(NANOVG_GL3)
 precision highp float;
#else
 precision mediump float;
#endif
#endif
#ifdef NANOVG_GL3
#ifdef USE_UNIFORMBUFFER
	layout(std140) uniform frag {
		mat3 scissorMat;
		mat3 paintMat;
		vec4 innerCol;
		vec4 outerCol;
		vec2 scissorExt;
		vec2 scissorScale;
		vec2 extent;
		float radius;
		float feather;
		float strokeMult;
		float strokeThr;
		int texType;
		int type;
	};
#else
	uniform vec4 frag[UNIFORMARRAY_SIZE];
#endif
	uniform sampler2D tex;
	in vec2 ftcoord;
	in vec2 fpos;
	out vec4 outColor;
#else
	uniform vec4 frag[UNIFORMARRAY_SIZE];
	uniform sampler2D tex;
	varying vec2 ftcoord;
	varying vec2 fpos;
#endif
#ifndef USE_UNIFORMBUFFER
	#define scissorMat mat3(frag[0].xyz, frag[1].xyz, frag[2].xyz)
	#define paintMat mat3(frag[3].xyz, frag[4].xyz, frag[5].xyz)
	#define innerCol frag[6]
	#define outerCol frag[7]
	#define scissorExt frag[8].xy
	#define scissorScale frag[8].zw
	#define extent frag[9].xy
	#define radius frag[9].z
	#define feather frag[9].w
	#define strokeMult frag[10].x
	#define strokeThr frag[10].y
	#define texType int(frag[10].z)
	#define type int(frag[10].w)
#endif

float sdroundrect(vec2 pt, vec2 ext, float rad) {
	vec2 ext2 = ext - vec2(rad,rad);
	vec2 d = abs(pt) - ext2;
	return min(max(d.x,d.y),0.0) + length(max(d,0.0)) - rad;
}

// Scissoring
float scissorMask(vec2 p) {
	vec2 sc = (abs((scissorMat * vec3(p,1.0)).xy) - scissorExt);
	sc = vec2(0.5,0.5) - sc * scissorScale;
	return clamp(sc.x,0.0,1.0) * clamp(sc.y,0.0,1.0);
}
#ifdef EDGE_AA
// Stroke - from [0..1] to clipped pyramid, where the slope is 1px.
float strokeMask() {
	return min(1.0, (1.0-abs(ftcoord.x*2.0-1.0))*strokeMult) * min(1.0, ftcoord.y);
}
#endif

void main(void) {
   vec4 result;
	float scissor = scissorMask(fpos);
#ifdef EDGE_AA
	float strokeAlpha = strokeMask();
#else
	float strokeAlpha = 1.0;
#endif
	if (type == 0) {			// Gradient
		// Calculate gradient color using box gradient
		vec2 pt = (paintMat * vec3(fpos,1.0)).xy;
		float d = clamp((sdroundrect(pt, extent, radius) + feather*0.5) / feather, 0.0, 1.0);
		vec4 color = mix(innerCol,outerCol,d);
		// Combine alpha
		color *= strokeAlpha * scissor;
		result = color;
	} else if (type == 1) {		// Image
		// Calculate color fron texture
		vec2 pt = (paintMat * vec3(fpos,1.0)).xy / extent;
#ifdef NANOVG_GL3
		vec4 color = texture(tex, pt);
#else
		vec4 color = texture2D(tex, pt);
#endif
		if (texType == 1) color = vec4(color.xyz*color.w,color.w);		if (texType == 2) color = vec4(color.x);		// Apply color tint and alpha.
		color *= innerCol;
		// Combine alpha
		color *= strokeAlpha * scissor;
		result = color;
	} else if (type == 2) {		// Stencil fill
		result = vec4(1,1,1,1);
	} else if (type == 3) {		// Textured tris
#ifdef NANOVG_GL3
		vec4 color = texture(tex, ftcoord);
#else
		vec4 color = texture2D(tex, ftcoord);
#endif
		if (texType == 1) color = vec4(color.xyz*color.w,color.w);		if (texType == 2) color = vec4(color.x);		color *= scissor;
		result = color * innerCol;
	}
#ifdef NANOVG_GL3
	outColor = result;
#else
	gl_FragColor = result;
#endif
}

#6

Ah ok! That’s good to hear and thanks for the clarification.


#7

So basically, yes it does just have a big if statement! Actually not a bad idea, it would probably have made my code a bit simpler to have done it that way, even if the performance would probably not have been much different.


#8

Instead of scan-line rendering or loop-blin (which is patented IIRC), nanovg uses the stencil buffer to render complex fills:

static void glnvg__fill(GLNVGcontext* gl, GLNVGcall* call)
{
	GLNVGpath* paths = &gl->paths[call->pathOffset];
	int i, npaths = call->pathCount;

	// Draw shapes
	glEnable(GL_STENCIL_TEST);
	glnvg__stencilMask(gl, 0xff);
	glnvg__stencilFunc(gl, GL_ALWAYS, 0, 0xff);
	glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);

	// set bindpoint for solid loc
	glnvg__setUniforms(gl, call->uniformOffset, 0);
	glnvg__checkError(gl, "fill simple");

	glStencilOpSeparate(GL_FRONT, GL_KEEP, GL_KEEP, GL_INCR_WRAP);
	glStencilOpSeparate(GL_BACK, GL_KEEP, GL_KEEP, GL_DECR_WRAP);
	glDisable(GL_CULL_FACE);
	for (i = 0; i < npaths; i++)
		glDrawArrays(GL_TRIANGLE_FAN, paths[i].fillOffset, paths[i].fillCount);
	glEnable(GL_CULL_FACE);

	// Draw anti-aliased pixels
	glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);

	glnvg__setUniforms(gl, call->uniformOffset + gl->fragSize, call->image);
	glnvg__checkError(gl, "fill fill");

	if (gl->flags & NVG_ANTIALIAS) {
		glnvg__stencilFunc(gl, GL_EQUAL, 0x00, 0xff);
		glStencilOp(GL_KEEP, GL_KEEP, GL_KEEP);
		// Draw fringes
		for (i = 0; i < npaths; i++)
			glDrawArrays(GL_TRIANGLE_STRIP, paths[i].strokeOffset, paths[i].strokeCount);
	}

	// Draw fill
	glnvg__stencilFunc(gl, GL_NOTEQUAL, 0x0, 0xff);
	glStencilOp(GL_ZERO, GL_ZERO, GL_ZERO);
	glDrawArrays(GL_TRIANGLES, call->triangleOffset, call->triangleCount);

	glDisable(GL_STENCIL_TEST);
}

#9

I have been comparing the performance of max/msp GUI to Audulus (Taylor’s software) to get a real world sense of these two UI rendering techniques in practice (i realize this isn’t a direct literal benchmark, more of a general notion of two modular synthesizers UI performance using JUCE vs NANOVG) :

the hands-down winner is Audulus. In max I often hit a point where interactivity of patch slows to intolerable levels with very few animated widgets, whereas in Audulus i have yet to see the UI rendering / interactivity slow down with every widget animating and giant patches genrating complex full songs using low level DSP elements.

The reason I bring this up is because it leads me to believe that NANOVG performance is far better for clean antialiased UI rendering, and I think the study of their open source github https://github.com/memononen/nanovg and possible adoption of their code or techniques into JUCE is worthy of exploration.

I will see if i can provide a more literal and direct test with a quantifiable benchmark soon.


#10

To be fair, the UI of Audulus looks significantly simpler than Max’s. It has lots of clean and simple vector shapes while Max does a lot more with shadowing, gradients, etc. That stuff gets expensive fast.

The best test would be to just make a LowLevelGraphicsContext implementation using nanoVG and benchmark it against JUCE’s renderers.


#11

If you do benchmark anything, please use the latest version on develop - I optimised a few things a couple of days ago that will make some large areas render a lot faster.

But also bear in mind that IIRC I avoided using stencil buffers because they didn’t allow anti-aliasing (can’t remember the details exactly, I wrote this years ago, could be that modern GPUs do provide better stencil tools now), and quality was more important than speed in this case.


#12

…also, having spent a lot of time staring at graphics code, PLEASE don’t jump to any conclusions unless you’re doing a literal like-for-like comparison, drawing exactly the same shapes! Tiny differences in what you draw will make big differences, even within the same engine.


#13

Afaik Max’s jgraphics api maps to Cairo not juces 2d graphics api


#14

@jules Of course, I was prefacing my statement fully realizing that my comparison was hardly anything to accurately judge by and was based around a general impression. And regarding NanoVG / stencil buffers, as I understand it NanoVG uses the stencil buffer for the antialiasing.

I suggest trying out Audulus, @tayholliday used JUCE for the windows/linux version with NanoVG for the UI, it looks awesome on all platforms and very responsive zooming around the dynamic UI. It has been editor pick in the app store and won Electronic Musician Magazine Editor’s Choice Award, 2017, it really is one of my favorite audio applications of all time: http://audulus.com/

@olilarkin you might dig it too ; ) you can patch all the way down to single sample feedback custom filters with the z-1 object. Looking at the API doc now for jgraphics https://cycling74.com/sdk/max-sdk-7.3.3/html/group__jgraphics.html
I guess I was working under the assumption they used JUCE for the graphics since Dave Z did the JUCE talk and case study on the JUCE page… and when they first did v5 I recall hearing about how the new UI was done in JUCE and how awesome that we could have antialiased rounded corners and curved patch cables


#15

@jules Appoligies for my harsh wording in my prior post, I just re-read it and it comes off rather rude and critical in a heavy handed way. I should say that I love JUCE, and you have created an incredible tool. My desires are fueled by wanting JUCE to be the best it can be.

At the time of my prior post I had very little direct experience with using JUCE, and now I am learning it more actively and beginning to realize some projects. I also have studied GLSL quite a bit in the past year so I will now have a bit more insight to be able to potentially provide some useful information.

I am planning to do some tests to try and compare some basic draw calls.
I am pondering a good basis for a benchmark. I was thinking I could start by loading nanoVG into JUCE and creating a simple application that draws 1000 Circles, lines, rectangles and paths, each for 10 seconds to start, and maybe this would generate a log output of the MIN, MAX and AVERAGE FPS afterward? Is there a good way to get a fair FPS count in the JUCE or nanoVG API? Or maybe it would be better to run the GPU profiler in the osx develper tools and log there?

I did this a ton when I was working on the omnimod hardware to hone in on the OLED screen refresh rate using different methods for drawing my GUI elements, and especially when I was working on implementing DMA. One thing that was helpful there was using a macro to log the number of CPU cycles per main loop, and then I could comment out different functions and mark their contribution pretty easily. Maybe something like that could work? I could get the FPS / ms per frame by checking the delta time between draw calls. Then I could set up a precompile #if directive to switch between the JUCE and nanoVG based API draw calls. Or just do them one after the other and present the evaluation at the end of the test inside the window. Maybe I will grab a GPU benchmark application like passmark and see what they do there.

The only reason this is an issue for me is that I plan to make some dynamic GUIs with animated modulation indicators and complex graphs for visual feedback, and I very much so want them to be leveraging the GPU in an optimal way to allow the CPU to dig deep into DSP and synthesis. I already have some pretty solid plans for how I want all of this to look and work, and am now wrapping my head around the JUCE api.


#16

one useful reference I found built in JUCE was the HELM synthesizer:


It has dynamic GUI modulation indicators, and all of his code for the rapidly updating parts are done with simple glsl shaders. I guess it makes sense from a CPU/GPU economics standpoint that if you don’t need 100 widgets animating all over the place at 60fps, building a GUI around a more static background and only the things that need to move fluidly can be realized in glsl… seems like a pretty good compromise.


#17

So far I found a few things that look useful, the included demo with switchable rendering methods seems like a good place to start for a benchmark, and this youtube video gave some insight into the current JUCE process. https://youtu.be/hvIcczswccI?t=29m