FR: JUCE Vulkan

I think the general assessment is that OpenVG is dead, since the consortium is not actively working on it.

Read about it : NV_path_rendering. It’s out of question too, since it’s locked to NVidia.

The most efficient is rare anyway, there is always a compromise between flexibility and performance.

I mean, it’s also possible to rasterize and cache a JUCE path in a Vulkan command buffer. In the end it’s just one draw call and some uniforms. But then it’s a static object, which excludes stuff like waveform rendering and dynamic shapes. Also it would be necessary to pull the batch / cache mechanism out of the context. The normal component Graphics rendering and paths wouldn’t profit from that.

Draw calls, texture uploads or quads per pixel/scanline aren’t the problem at this point. You can throw much more at the GPU. The bandwidth use compared to a modern AAA title is laughable. The problem is the intensive single threaded EdgeTable, scanline, clipping and path rasterization on the CPU.
If anything of it can be avoided, it would give a massive boost.

Measuring the GraphicsDemo, the main bottlenecks are paths and complex shapes (SVG). Anything that increases the complexity of clipping regions and the EdgeTable. Which are only necessary because we wan’t the AA quality.

About the specialized SDF shapes. I understand the purpose.

Yes, if we call something like g.fillEllipse(x, y, width, height); , then we could create a virtual “EllipseObject”, which ultimately selects the ellipse SDF shader. Most of the time the object to be drawn lies within the context clip region, so we can pass the simplified geometry data 1:1 without further processing.

BUT as soon as there is a clipping region involved, the ellipse object will intersect with rectangles or paths. For simple intersections like rect2rect or circle-rect we could spawn or split them into simple SDF objects. But in reality it’s required that we can clip any shape. So yes, in the optimal case the context should choose a specialized SDF equation (Rect, Rounded Rect, Ellipse, Star). Even if, it’s still necessary to have a fallback for general paths and polygon shapes.

If it was just the shapes without AA, no problem. There are enough triangulation methods out there.
The problem is really the combination. Any polygon shape + anti aliasing + clipping regions.


At the moment I fear it’s not possible and that specialized shaders are much more profitable than optimizing juce::Graphics.

The good news: It’s easy to specialize and cache with Vulkan, since it’s multi-threaded and no context switching like in OpenGL is involved.

They released 1.1 of the spec in 2020, so it seems they are …

Of course I read about it, that’s why I said :roll_eyes:

obviously this is not a solution you can use in a plugin context as you would limit your userbase.

Remember not everyone has the latest RTX card on their rig. We’re targeting audio folks, not gamers, they might be running an iGPU on a laptop.
Anyway it’s pretty meaningless to compare things without a proper benchmark on a real-world scenario.
And in any case there’s no one-size fits all solution, if you want complex clipping, blending etc you could use something like this highly optimized cpu renderer blend2d.
It seems overkill to use the cpu for everything though as most UI’s use simple shapes that can be rendered directly on the GPU. A mix and match approach would probably be best to balance performance vs flexibility.

Sorry, forgot the “I”. I read about it. I also remember reading that Googles Skia is using it. Not sure if this information is up to date though. If it was easy to implement, it could be a good “optional” implementation to boost things if the hardware is available. I somehow fear that at the end, using it “under the hood” in a JUCE graphics context will result in the same problems. Clipping regions and so on.

Oh, interesting. Although it’s 1.1 of OpenVG Lite, since OpenVG 1.1 was released in 2008. Wow, what happened in the meantime? I don’t think a released spec can be equated with “widely used solution”. Most comments I read gave the opinion that it’s essentially dead. From the looks there isn’t happening much with it. My hopes are not that high. I haven’t read much about it, but where would I start to get it into JUCE?

I don’t know. From audio folks I expect a workstation and a middle class GPU, or a laptop for sketching ideas. If a producer uses a laptop for his main prodution, he shouldn’t expect 60 fps and no fan movement. The same as a gamer can’t expect 4K graphics without necessary equipment. Developing for the lowest possible denominator will ultimately harm your product. Should we use aliased curves and fonts because some Apple customer uses a Macbook from 2010 for his production? No way : )

But that’s not the problem. Even if it’s an integrated graphics chip. It’s far better to use a specialized chip than wasting CPU cycles.

I don’t really expect a one-size fits all solution. But what I expect is a GPU solution. And if the user doesn’t have the required hardware, then it’s always possible to tune down the fidelity. PC Games do this for years. Turn off anti aliasing. Turn off texture sizes. Turn down the resolution. Turn off the amount of VRAM used. Then you can play Minecraft on a crappy laptop : D, which is still an order of magnitude higher GPU use than a simple 2D UI.

The benchmark in juce::Graphics case is simple.

  • Static Images
  • Transformed Images
  • Stroked Paths
  • Filled Paths

Any filled primitive that is clipped will result in a filled path. This is the reality for every JUCE application. The only difference from case to case is how intensive the parts are used.

From a modern C++ graphics API I expect at least the performance of a browser canvas API. Even a mobile phone renders quicker than the JUCE graphics API. They improved the implementation over the years (Android using SKIA / Vulkan), while juce::Graphics is idling, claiming: Everything fine here! You can’t expect more.

Anyway I implemented an OpenGL and Vulkan context in a real-world scenario. An audio plugin using images, font glyph rendering and waveform paths. It will always come down to the EdgeTable preprocessing of the CPU. Don’t know what else we should compare.
Seems obvious. Move the rasterization part to the GPU. They are built for it.

I found this entertaining thread from 2011.

Apparently @jules was facing exactly the same problem. It fundamentally comes down to
rendering anti-aliased polygons from a path.

He got quite pissed, people kept posting the same ideas that he already tested. A funny read : D
So from this experience I essentially take the following:

  1. Hardware implemented MSAA for vector graphics looks like crap.

  2. Optimizing or caching a whole stack of drawing operations is not really feasible, since it’s layered 2D graphics.

  3. OpenGL performance suffers from framebuffer switching.

  4. For small paths it was not clear if the triangulation costs more than the edge table calculation.

I don’t want to tear up old wounds, but he tried one idea that I haven’t considered before.
Anti-Aliasing via accumulation buffer. To quote the coarse idea:

He got it almost working, but performance was very bad. The problem was that binding / switching frame buffers was very expensive. It might be worth testing the same method in Vulkan.

Why? For once: Framebuffer switching is different.
Vulkan uses render passes and could utilize a sub or multi-pass solution and a mask shader with a more specialized format for this purpose.
And then there is instanced rendering. While the OpenGL version JUCE uses doesn’t allow instanced rendering and it would be necessary to render the same geometry with multiple draw calls, in Vulkan we can draw it instanced with different jitter uniforms or push constants to achieve anti aliased 2D geometry in one call.

I’m not sure how good the “jitter polygon into mono mask” method will look. But it’s worth trying. Seems a lot more general than specialized methods for each shape. And the results are tweakable, the AA quality depends on the amount of draw calls of the polygon.


Did anyone ever try something similar to anti-aliasing via accumulation buffer?
Also @jules, I know you are busy with SOUL, but if you have any input on this topic I would much appreciate it!

3 Likes

I think the fundamental way Juce renders things need to be changed - eventually !:worried:!

Can you brute force anti-alias by rendering at high resolution and then reducing it down to present it, or ‘spatial anti-aliasing’ as I think it’s known.

In the meantime it’s good to know you’ve actually got Vulkan working at all.
All I need is a area to render to and I’ll be set… :grin:
I guess you’re not going to share your triangle renderer?

4 Likes

To get nice aliasing you’d have to scale the image up 8x in each direction… to get 256 levels of grey between black and white. That would need far too much memory and be very slow even on the GPU. If a lower “oversample” factor is used, steps become visible on tilted lines.

When I looked into drawing 2d paths with OpenGL a few years back, I found an interesting technique used by MapBox:

https://blog.mapbox.com/drawing-antialiased-lines-with-opengl-8766f34192dc

Not sure whether that’s patent-free, but quality and performance are obviously great as MapBox was written for older mobile GPUs. They host a few webgl demos. It uses a combination of normal vectors added to the vertices and a pixel shader to do the antialiasing on the pixel shader.

This is SSAA and the simplest form of MSAA. While for games it gives acceptable results, for vector graphics 8x is not even close to the quality the current JUCE renderer offers. While doing tests it seemed at least 32x is necessary to give acceptable results.

Yes, using a bigger framebuffer is just the first thing that comes to mind, but for various reasons it’s a very bad choice. There are many other AA techniques. One of them is some form of accumulation buffer. The geometry gets shifted by “subpixels” and will be accumulated, whichis one way of doing MSAA. I tested this with instanced rendering in Vulkan.

The big advantage: The submitted geometry stays the same. No additional memory for the huge framebuffer.

The problem: It needs two passes. One for the accumulated AA mask, one for the colour/texture/gradient sampling and blending.

Here is the result: At 1280x720 with 150% windows scaling (1920x1080). It’s the path rendering example from the JUCE Graphics demo. With 32x AA comes close to JUCEs scanline renderer.

Here a view of the actual geometry from RenderDoc. You see the draw call vkCmdDrawIndexed(3519, 32) … uploaded are the vertices and drawn are 3519 indices. Times 32. While this seems much, it’s still in the acceptable range compared to “one quad per pixel”.

Since this is a two pass method, we also need to draw the mask and apply the actual “fill”. We could draw the mask with one quad, but this will draw a lot of transparent pixels and we want to reduce the fill rate, so we draw a simplified “outline” of the polygons. The outline needed 2211 indices. This time only drawn once.

Here is the intermediate result of the AA mask buffer zoomed in by 800%. Quality seems good enough.
By the way, for the MSAA, a non uniform pattern was chosen. Supersampling - Wikipedia
The pattern was generated viad poisson disk sampling.


Now while this sounds promising, there are unfortunately a lot of problems:

First, the raw process is this:

  1. Flatten the juce::Path with juce::PathFlatteningIterator. This way the poly lines can be passed to the clipper.

  2. ClipperLib uses the lines and applies the clip region of the current context. This converts the path to a PolyTree, necessary to resolve any self intersection and the winding order stuff. Essentially you get a tree of contours that are either outlines or holes.

  3. Mapbox Earcut GitHub - mapbox/earcut.hpp: Fast, header-only polygon triangulation
    (poly2tri was too slow and didn’t give acceptable results) is used to perform triangulation. The result is a list of vertices and the triangulation as an index buffer. This can be directly uploaded to the GPU.


Now the problems:

  1. Clipping per frame is slow! There are also some worst case scenarios (Rectangle List with many independent lines). Where the clipper got completed stuck, probably due to O(N²) complexity.

  2. Triangulation per frame is slow and even for small shapes, the overall amount of triangles is still relatively big.

  3. It’s necessary to have an accumulation framebuffer for each “object”. Which increased memory use quite a bit. Not as much as naive SSAA, but for many small objects, this will probably use up a lot of memory bandwidth.

  4. I can’t find a way to perform AA mask rendering and filling of the end result into one renderpass. This requires the intermediate mask framebuffers to stay in memory until they are merged at the last step.

I’m sure there are a few ways to optimise things here and there, but I’m not satisfied with the result.
Let’s not forget the initial goal. Reduce CPU preprocessing and utilize the GPU. Unfortunately all the clipping and triangulation are performed on the CPU too.

My assessment: For bigger objects and resolutions, this method could outperform the scanline OpenGL method. But for small glyphs and many objects (especially fonts), the performance will probably end up worse.


New Idea:

Reconsidering this, the bottleneck part of the OpenGL scanline method is not the amount of triangles. GPUs can easily handle this.

It’s the pre-processing and submission. There are 3 parts and I think it’s possible to move some of the work to the GPU.

  1. juce::Path gets flattened by juce::PathFlatteningIterator.
  2. Paths get converted into a juce::Edgetable, clipping and more is applied.
  3. juce::EdgeTable is iterated (via scanline) and vertices (pixel quads) are generated.

The first thing that came to my mind: The scanline is applied per Y span. Isn’t it possible to multithread this? Probably. Now if it can be multithreaded, can’t the GPU do this job?

Compute shaders to the rescue!
From what I get it should be possible to upload the juce::EdgeTable data structure directly into a Vulkan memory buffer. From there a compute shader can be dispatched that can run the scanline iterator in parallel and write the generated vertices directly to a device local vertex buffer. This will not only reduce the time spent on CPU, it also avoids the vertexbuffer uploads completely!

The same goes for the Path to EdgeTable conversion, although this process is not trivially converted into a parallel algorithm.

That was all for now.

Sorry, this is still too much work in progress. At this stage it’s problematic to share things due to various questions about licensing and stuff.

But what do you mean by changing the fundamental way? I thought about this and came to the conclusion that no matter what, you always end up at the same problem. And the more I read into it, the more I see that this is actually a rather open problem that has no clear answers. Even big projects like Skia struggle with this, but found a good method for their purpose.

If anyone has some insights about Skia btw, feel free to share!

1 Like

Indeed. That’s why I wrote 8x each direction… meaning 256x overall for perfect results - which makes this approach useless.

I put up the link to the MapBox blog because it seems to have all the techniques that would be needed for truly fast vector rendering. The fact it runs at 60fps on mobile with complex maps means they had to do everything they could to unload things from the CPU to the slowish GLES GPUs using vertex and pixel shaders and as few draw calls as possible. There is a c++ version of the rendering engine:

It renders fonts and seems to support gradients and is well tested. It might be a nice source for inspiration if you guys want to rewrite the JUCE renderer - too bad it’s based on OpenGL and not Metal/Vulcan.

GPUs like to render large areas at the same time. They don’t like halting in any way, and they really hate rendering in pixel strips!
Also, if you should aim to use as few ‘draw calls’ as possible, or whatever vulkan uses, the better. I was hoping that would change, but I don’t know vulkan at all so I can’t comment.
But I’d bet that idea of “here’s a bunch of polygons, go render it” will still be the best route to take. The parallel nature of GPUs is outstanding.

I’d wager that you could draw an entire plugin, with thousands of polygons in a few milliseconds.
It’s just a matter of trying it. My Barchimes plug-in rendered the entire interface at once, and that was on the first Mac Mini running smoothly inside Logic, over 10 years ago now.
BarChimes
I haven’t updated it to 64 bit yet, as it’s not my best seller, and I was stopped by Apple’s news about dropping OpenGL)

I see what you mean by rendering a large off screen buffer, it would take up a bit of memory, some people may not have, possibly.

Rendering a strip of quads to draw lines is a pretty standard thing to do for decades. If you use ‘smoothstep’ or equivalent. you can make the lines as smooth as you want, no matter the resolution. The quads just have to cover the area of the line. And if you’re using SDFs like someone already mentioned, it’ll be rendered using a shader rather than a pre built line texture, which gives you maximum control over the line and its geometry.

I mentioned sharing because I feel we’re all sitting here with bated breath waiting for Juce to offer a head start as an official develop Juce release. I don’t think anybody will judge your code - it’s brilliant you have the time to look into it to be honest.

4 Likes

My bad, I misread it as 8x AA.

You probably overlooked it due to the amount of text, but I posted the link in December 2020 together with another SDF technique. So I’m aware of mapbox-gl and read the article too. I wish it was that easy! Unfortunately the SDF + line segment techniques are not suited for the task.

To summarize it for all… We submit 2 vertices per line point. 4 per line segment. Extrude the segment in the vertex shader via uniform values to get a quad and then use signed distance fields in the fragment shader to “blend” the edges of each segment. So far so good.

There are a few things we have to consider here:
It’s for line segments only! Sure, there are other SDF shapes, but all of them represent only one geometric shape. To get “any polygon” shape, which juce::Path requires, we have to chain or merge the methods. For line segments this is easy, as shown in the article. But what about fills? We can’t submit an N sided polygon and just apply SDF magic, because then we have to consider each vertex for each pixel (because of winding order), which is very expensive! The question about clip regions is also not solved.

Now you could assess: But the maps use fills too!
I’m not sure 100% if this is correct, but looking at the maps it seems they achieve fills by just using a regular aliased fill and just overdraw an AA outline. This works for maps. Because map outlines do not intersect, and if, they are not transparent. So we can fake polygon fills with AA outline + regular aliased fill layer. This will not work for gradients, textures, transparency or blend modes. It’s still possible to achieve transparent fills with maps, but only if a “whole layer” uses opacity.
Which would require a separate renderpass/framebuffer. And due to the nature of the dynamic JUCE graphics API we can’t cache this like a map viewer does.

I really can’t see a performant and easy solution because the JUCE graphics API dictates dynamic self intersecting path shapes per frame with clip regions, opacity and fill.
Let’s not forget that even if the SDF rendering performance is superb - the setup is probably not. Maps are static and can be cached to a certain degree. So I think the usefulness of the SDF shapes in this case is a bit misleading. Because we can’t apply the same preprocessing and have to calculate it on the fly for each call.


Yeah, I don’t doubt it. If I had the task to achieve this, I would specialize and use a method that fits the requirements. In your Barchime for example. If we just put a transparent outline around the individual chimes graphics we don’t need AA. We can just draw and fill one quad for each element, which is insanely fast. 60 FPS on 4k guaranteed.

But doing the same just by using juce::Graphics::drawImage ? That’s another thing.
Of course we could also discuss and extended graphic API. For example something like this:

void paint(Graphics& g) override
{
  if(vulkan::isSupported(g))
  {
     vulkan::Graphics sg(g);     

     if(!batch)
     {
       VulkanContext& context = sg.getContext();
         
       batch.reset(new ShaderBatch(context));
       
       batch->setAntiAliasing(..);              
       batch->addDrawable(TextureDrawable(.."chime.png"));
     }

     sg.drawBatch(batch, transform);
  } 
  else
  {
    // Fallback if the context is a software renderer
  }
}

In this example we could preprocess the triangulation at the setup state of a batch and make anti aliasing optional. The juce::Graphics API stays as it is and can still be used like we’re used to. The new API on top is optional and users can build their own batch/cache/drawables objects. Also notice: Even if the Image here is added at the setup state, we can still supply an affine transformation and update the texture, because both will not change the vertex attributes but only the bound shader uniforms.

What do you guys think about such an extended API? I see a major disadvantage: It’s specialization and the existing juce::Components would not benefit.

I don’t know know much about the low level details of drawing APIs.

But what’s clear to me from this discussion is that the way forward would be to add a new drawing API that would fit Vulkan/Metal/etc in terms of the data passed in and how it’s stored so it wouldn’t require any flattening and be GPU-friendly.

I would imagine, broadly speaking, something like that added to Component:

virtual void draw(ImprovedGraphics& g)
{
    //default implementation, just calls paint and uses whatever mechanism exists now:
    paint (g.getClassicGraphicsObject());
}

Then over time, both built in and custom JUCE components can add support.

I’m just looking for an additional 3D/2D renderer to complement Juce’s.

So I can have all of Juce’s fine and lovely software rendering for most of the static stuff, buttons and dials. And rectangular sections with animated GPU rendering for things like fast moving waveforms and FFT visualisations - and other things we all use to attract customers! :slight_smile:

Juce could take the Skia SDK seriously as a replacement. It uses it’s own backends for the hardware rendering which has taken over most if not all of it’s software rendering. It does everything Juce needs, including a very accomplished SVG renderer. It’s used to render Android, Chrome and many others on all platforms as you may know. https://skia.org/

Any way, like I said, for now I just need an additional GPU renderer as an addition to Juce’s long established system.

3 Likes

I forgot to mention it here. For anyone interested, the Vulkan modules are now available as open source project on github. Check it out and post some feedback if you like.

7 Likes

I’m looking at this now, apologies for the wait.
It’s a great speed up for general Juce graphics for me. There’s a lot of work gone into this, @parawave.

I wish a Juce team member would look at this. - it’s amazing.

The only bug I can see is that Combox’s are not drawn properly:
ice_video_20211120-144123

As you can see it’s a standard look and feel 4, when the menu button is pressed the dropdown shows momentarily, but then disappears UNTIL the mouse is lowered over it. Very strange.

1 Like

Wish I could get this to work - I end up with mostly a black screen and occasional flickering of my actual UI. After a while it hangs, or if the window resizes it hangs. I checked my video card compatibility GTX 960, should be fine, so I’m not really sure what the issue is.

+100000 though to get the JUCE team to take a look at this!!

@nlaroche Hmm, that’s a shame.
Have you tried setting the PW_SHOW_VERBOSE_DEBUG_MESSAGES flag, by clicking on the module in the Producer?
If should report things like ‘[Vulkan] Created instance’, and ‘[Vulkan] Created device’ in your Output window along with any errors I presume.

Nice to see people finally testing it : D

The thing is, normally Vulkan is very verbose and specific with its error messages. It surprises me a little that there is just a black screen, or random glitches. I don’t think the reason lies within the specific graphic context implementation, since its logical flow and structure is almost identical to the JUCE OpenGL context. But there is definitely something missing. I have a few guesses, like unsupported swap chain image formats. Driver specific stuff.

Unfortunately I don’t have the time/capacity to look further into this. Can’t replicate it here, so where to begin? My recommendation: Someone with the problem has to use a debug tool like RenderDoc https://renderdoc.org/, start the application, and look for errors. Or at least try to find the time/location in code, when it happens.

2 Likes

I really want to get stuck into this. I don’t know where to start though. Is there any way you can show a simple example of what your Wrapper helps with? - even if you consider it a bad coding example - just something to start us off?
While reading/watching Vulkan tutorials, it seems you have helped with a lot of the verbage, so which bits have you done?

(p.s. I get the feeling it needs to be Mac worthy before anyone from Juce will take it seriously, even though it has a ton of votes, and rendering being the most sort after feature voted for in a poll.)

The implementation of the context and wrapper is not platform dependent. So in reality, to add Mac support, it only needs an implementation of
the native surface in vulkan-cpp-library/pw_Vulkan_osx.h at main · parawave/vulkan-cpp-library · GitHub

… Plus someone has to figure out how to actually set up XCode to build or include the MoltenVK library (which handles the conversion of Vulkan API calls to Metal).

I think the problem is not the interest, it’s just that there is no free developer on the JUCE team to dive into this. Or they already research an alternative idea for the future. Like a Skia or WebGL backend.

I will continue with the Vulkan module specific questions in the other topic.

2 Likes

I haven’t tried adding the flag yet, since there is no mac implementation I can’t realistically sink time into this as I wouldn’t be able to release it. I’d give it another shot if there was a future for this project, from what I gather development has sort of halted and the juce team hasn’t looked into this year either, which is a shame.