using GLU tesselator ?
http://glprogramming.com/red/chapter11.html
some infos on AA
Sadly, no⊠I already spent a long time writing a very cunning path-to-triangle algorithm, but the only way to reliably render it with AA ended up being slower than just creating a texture from the edge table, which is how it works now.
Itâd probably be fast to use triangles if you were happy to just rely on the deviceâs multisampling to perform the AA, but as far as I can tell the results with that are (at best) piss-poor, or at worst, not anti-aliased at all.
I also found that the quantity of triangle data that needed to be sent to the GPU for some paths was actually not much smaller than the texture data generated from an edge table. And generating an edge table is faster than triangulating. So basically, my conclusion was that unless youâre drawing a big, simple, non-antialised polygon, triangles just make things harder.
Iâd be interested to hear from anyone whoâs had experience of profiling GL, and could work out where the bottlenecks really are⊠I suspect that it could all be improved greatly with just a few tweaks in the right places.
Few remarks, by reading at the Git (not tested here, so might worth it):
So instead of:
for each path:
scan-line raster the path to create edgetable
render edgetable
You should probably have:
If you've geometry shaders,
Send the path curves characteristics in a buffer object, and tesselate in the GS.
-- next follow below
for each path:
figure out the number of vertex along the path with a tesselating algorithm (don't need a large resolution here). You don't need to tesselate here
in the vertex shader, tesselate and rasterize the vertex you've sent using the same algorithm (Bezier ?) as what you're doing on the CPU. Export all the modified vertex.
The output of this step is like the EdgeTable array, but in a format the GPU can understand
-- this part is common to GS/VS
In the fragment shader, you'll do the final rasterization using the vertex coming from the previous shader.
Iâm not sure itâs worth transfering back the transformed VBO, but itâs an idea to test, so you only do it once per path rendering.
Using GL_BLEND allows you to actually do antialiasing with no overhead, and itâs âpixel perfectâ, but youâll divide the possible frame rate (not sure itâs an issue).
I couldnât find where is the triangulation in the Edgetable code, so I canât tell much more.
Thanks cyril, but I think youâre under-estimating how well I understand it, and how deeply Iâve already investigated this stuff. Most of what you say are things that Iâve already tried, and which failed to work as youâd expect.
You certainly have more knowledge about the current rendering than I do.
For antialiasing, I was thinking of using a Blend mode. Youâll then ignore the anti-aliasing issue, letting the GL primitive blend with the current rendering buffer.
That way youâll draw in your fragment, and the color will be blended magically by the hardware.
You probably already know it, but this is very useful:
I wonder if the polygon stuff in OGL is enough for the current path code, but maybe itâs doesnât worth it ?
You might want to have a look here too:
http://code.google.com/p/skia/source/browse/#svn%2Ftrunk
Check the file SkConcaveToTriangles.cpp and the gpu folder
[quote]For antialiasing, I was thinking of using a Blend mode. Youâll then ignore the anti-aliasing issue, letting the GL primitive blend with the current rendering buffer.
That way youâll draw in your fragment, and the color will be blended magically by the hardware.
[/quote]
Eh?? Blending != anti-aliasing. Are you confusing the terms âsemi-transparencyâ and âanti-aliasingâ?
And yes, I looked at the skia triangulation stuff, but it didnât handle re-entrant paths with implicit holes. So I spent a couple of days writing a cunning triangulation algorithm of my own that did work correctly, but it was a waste of time because even when youâve got the triangles, thereâs no way to actually render them with AA. Honestly, itâs all a total pain!
Im no OGL export, and you probably already tried this, but anywaysâŠ
Its a kind of well established that you need to avoid texture swapping, and pre-bake textures into atlases, etc, to have just a very few draw batches, especially on iOS/Android.
So in the just world of Juce, and general path rendering, maybe it would be possible to do all complex path rendering on one approporately big static render-texture (FBO), multisampled/supersampled by some really big factor- maybe 8?
No, like I said, I did it with framebuffers originally, and it was like hitting a brick wall in terms of performance. The fastest way I found to get new data into the pipeline was by creating a new texture, because that can be done without swapping the rendering target.
Sorry its late here⊠but that sound just wrong. Creating new textures will cause swapping textures, and that going to slow it.
Having one static texture for rendering paths (clearing it before rendering a path) should avoid swapping.
The only knowledge I have about keeping things efficient with rendering (apart from doing less!) is to minimise switching of shader settings. i.e. if youâre drawing 3000 quads, with each textured (randomly) with one of 3 (different) textures, the quickest way to draw them would be to have them sorted into 3 sets and render them in 3 batches. Of course, this would also require a z buffer to ensure that they ultimately overlap properly, as they would not be getting drawn in the back-to-front order.
of course, itâs easier to think about stuff like that with games than a general purpose ui framework!
and i doubt that is of any use [and has already been considered]
I thought so too, which is why my first attempt was to use a framebuffer. But like I said, in practice itâs many times faster to create a new texture and upload it than to mess about swapping to a framebuffer and issuing draw commands.
And for a lot of the drawing that gets done, the sizes are fairly small, so to send a small mono texture to the GPU is often actually less data than passing it a massive list of triangles + commands to draw them.
Yeah, I wish that was the kind of optimisation I could do! Iâve got a few ideas left that might reduce the number of GL function calls and let the GPU work on larger batches of triangles, but Iâve been finding that whatever intuitions I have about what should be more efficient never seem to match reality!
No. What I was thinking was something like this:
Since the alpha + color will blend with the previous output, youâll get anti-aliasing effect (in fact, itâs a pseudo anti-aliasing, but I wonder itâll look as good as real AA).
Or you have this option too: http://visual-computing.intel-research.net/publications/papers/2009/mlaa/mlaa.pdf
Implementation here: mesa/mesa - The Mesa 3D Graphics Library (mirrored from https://gitlab.freedesktop.org/mesa/mesa) and here: http://visual-computing.intel-research.net/publications/papers/2009/mlaa/testMLAA.zip
Also the FXAA technic might be useful: Khronos Forums - Khronos Standards community discussions (see the shader code to test, and also, the 2 links in the final posts)
Also you can also accept that AA is not absolutely required, when enabling OGL rendering.
Or, stupid question, did you check WGL_SAMPLE_BUFFERS GLX_SAMPLE_BUFFERS ?
OBOâs remark sound correct to me too. In the VideoComponent Iâve written, Iâm uploading YUV data to 3 differents textures, and using a double-buffering technic (6 actual textures used). The buffer Iâve mapped is not the one currently used in the pipeline, and on glEnd, Iâm swapping the textures for the next rendering. This comes with absolutely no performance hit. But youâre not using OGL for path rendering in this technic.
If you try to map a buffer thatâs being used, youâre going to hit a pipeline stall, but Iâm sure youâve already thought about that.
[quote]1) enable blending so the fragment accumulate on both alpha & colors
2) rasterize your edgetable to your shader with a no offset (0).
3) In your fragment shader/GS compute the actual rasterization with a pixel grid thatâs more dense than the actual output.
4) Then compute the actual alpha sum for each final destination pixel, and emit that color, multiplied by the alpha value resulting from the sum.
Itâs like doing multisampling in the fragment shader/GS.[/quote]
Ok, that sounds interesting. My knowledge of shaders is very very sketchy, but I thought that the way it worked was that a fragment shader only has access to a single pixel, so that itâs not possible to calculate the sum of more than one pixel?
Yes, what youâre doing sounds like the same as my current implementation. Uploading data to a new texture does seem pretty efficient.
Youâre right, but itâs not an issue.
You need to set up a FB thatâs larger than the required area (for example twice as large in both width and height).
In the FS, youâll do
// uniform width and uniform height required on top
if (gl_FragCoord.x > width / 2 || gl_FragCoord.y > height / 2)
// We don't care about the result here
return;
float realWidth = 1 / (width * 2), realHeight = 1 / (height * 2);
vec2 topLeft = vec2(gl_FragCoord.x - 0.5 * realWidth, gl_FragCoord.y - 0.5 * realHeight);
vec2 topRight = vec2(gl_FragCoord.x + 0.5 * realWidth, gl_FragCoord.y - 0.5 * realHeight);
vec2 bottomLeft = vec2(gl_FragCoord.x - 0.5 * realWidth, gl_FragCoord.y + 0.5 * realHeight);
vec2 bottomRight = vec2(gl_FragCoord.x + 0.5 * realWidth, gl_FragCoord.y + 0.5 * realHeight);
glFrag_Color = texture2D(tex,topLeft) * 0.25 + texture2D(tex, topRight) * 0.25 + texture2D(tex, bottomLeft) * 0.25 + texture2D(tex, bottomRight) * 0.25;
Gaaahhh⊠For the millionth time on this thread: writing to intermediate framebuffers is NOT AN OPTION! I already had some great code that drew nicely AA polygons into a normal-sized framebuffer without shaders or anything fancy. That wasnât the problem, it was the swapping between framebuffers that made it unusable.
What I need to speed it all up would be some trick that I could use to send a bunch of triangles directly to the screen, and have them drawn with anti-aliasing, straight into the target - NOT involving a f**king framebuffer!!
You can write to a FBO thatâs attached to the render buffer. So you donât have to swap them.
I donât know what youâve tried. Did you see this: http://www.songho.ca/opengl/gl_fbo.html
Clearly their FBO demo code is actually faster than without FBO.
Anyway, in that case, what about MLAA or FXAA (see my previous post).
The former applies AA on the final image without requiring you to do any AA rendering at all, the later is doing AA per primitive.
Unfortunately I donât have much spare time for the next weeks, but Iâd have loved to see what you came with, and let me try your tries by myself to see what could be the issue.
Yes, of course I understand how framebuffers work!!
The framebuffer stuff doesnât appear slow, but when youâre drawing hundreds of paths per update, that means hundreds of framebuffer switches, and my app was spending almost all the CPU time sitting inside either either glClear or the function that binds a new framebuffer, presumably flushing the pipeline before continuing. Other drivers may handle that situation better, but Iâm using a top-end MacBook with a good Nvidea GPU, so if something doesnât work here, itâs clearly not a workable design.
And yes, of course MLAA and FXAA were the first things I looked into, but theyâre crappy quality, and unavailable on a lot of drivers. It might be possible on high-end smartphones to just use MLAA, just because the resolutions are so high that it doesnât matter, but I want something that works in general too.
Thatâs the part I donât get. Why do you render to different framebuffer, and not only one using a blend function ?
Thatâs the part I donât get. Why do you render to different framebuffer, and not only one using a blend function ?[/quote]
For what feels like the thousandth time, this is what I was doing:
Here is a very unorthodox idea, that almost avoids swapping rendertargets: