LowLevelGraphicsSoftwareRenderer: SharedImage access


#1

I had to write a custom drawing function for an animated multipoint envelope to speed up drawing, because some of JUCE’s functions were to slow (some are very fast too, for instance fillRect()). I really could feel the difference to JUCE146 with some. It would take about 2x-4x more CPU to do the same thing.

Now I could have just generated an Image and drawn that to the screen via drawImageAt(), but it’s slow, especially at fullscreen. What’s much faster is to directly access JUCE’s own framebuffer. So I had to hack JUCE’s LowLevelGraphicsSoftwareRenderer to do exactly that. The performance gain is extremely high, instead of about 60fps I get 140fps at fullscreen, so my animation is very very fluid and consumes less CPU.

It would be nice to have the possibilty to gain access to JUCE’s internal framebuffer without custom hacking. So you’d just get a pointer to a SharedImage, int xOrigin & int yOrigin, and you could start manipulating the image data.


#2

I can’t make the buffer accessible because you have to assume that it may not even be held in main memory (e.g. for CoreGraphics).

If one of the juce functions is causing you problems, why not optimise it rather than writing a separate hack?


#3

Well, in that case you could just return a 0-pointer?!


#4

There are also clipping considerations - you might be using a small image that only covers part of the graphics context, and is protected by the clipping region. It’s just all wrong to give access to that kind of stuff! If there’s something you can’t do, it’d be better to fix the juce renderer to do it properly, rather than opening it up.


#5

Yes, ofcourse one has to take in consideration the clipping region not to overwrite any memory. This kind of option would be for “pro” users only, and I think nobody would be enough crazy to touch it if was not absolutely required.

As for the functions you could speed up: I did a faster drawVerticalLine (about 2x faster than JUCE) and a very special function, that does not exist in JUCE (and probably never will) to draw a curve that is made of N points (xN, yN) where xN is an integer that linearly goes from 0 to width-1 and yN is a float between 0 and height. I tried drawing the curve using Paths, it was too slow. I tried with drawVerticalLine(), it didn’t look good. With drawLine() it did look good, but was slow again (17fps vs. 150fps now, and it looks just as good now).

Meanwhile, I’ll happily continue to use my hacked JUCE.


#6

zamrate,
I’m very very interested in your hacked drawVerticalLine. Would you mind sharing it?
Thanks


#7

Well here is a simple solution that should satisfy your needs for access to the Image, and also keep Jules happy about not exposing the internals:

Consider the existing code for handling a paint message

Win32ComponentPeer::handlePaintMessage()
{
  //...
                LowLevelGraphicsSoftwareRenderer context (offscreenImage, -x, -y, contextClip);
  //...
}

If Jules would add some functions, the creation of the LowLevelGraphicsSoftwareRenderer can be put into a virtual function instead of as a hard-coded stack variable. This would be a clean way to provide access to the image - when you want to use your own drawing code, you create the Image yourself (with appropriate passed-in parameters from handlePaintMessage) and then create the low level renderer yourself (which can be your own subclass).

True, on the Mac this will bypass using CoreGraphics drawing but thats what the original poster wants anyway.

And in my case, I can create N low level renderers, where N = number of processors, and repaint my component peers in horizontal strips concurrently (after Jules makes the glyph cache thread-safe).


#8

[quote]I’m very very interested in your hacked drawVerticalLine. Would you mind sharing it?
Thanks[/quote]
Impossible, I’m hired by a company to do this and they surely do not pay me to share their code.


#9

Ah, too bad. Thanks anyway!


#10

Not simple at all, more simple is to dynamic cast Graphics::getInternalContext() to LowLevelGraphicsSoftwareRenderer and if it is not 0 you could via a special function that I added to LowLevelGraphicsSoftwareRenderer easily get the internal image + xOrigin/yOrigin. If the dynamic cast fails you’re sure it’s not the software renderer.


#11

Simple in the sense, that Jules might actually go for it, since it is clean, and could allow for more things than just “getting at the image.”


#12

[quote=“masshacker”]zamrate,
I’m very very interested in your hacked drawVerticalLine. Would you mind sharing it?
Thanks[/quote]

A general form for fast pixel processing operations looks like this:

void processRgb (
  int rows, int cols,
  uint8* dest, int destRowBytes,
  uint8 const* src, int srcRowBytes)
{
  srcRowBytes -= 3 * cols;
  destRowBytes -= 3 * cols;

  while (rows--)
  {
    for (int col = cols; col; --col)
    {
      // inner loop
      *dest++ = fr (*src++); *dest++ = fg (*src++); *dest++ = fb (*src++);
    }
    src += srcRowBytes;
    dest += destRowBytes;
  }
}

fr(), fg(), and fb() are the operations to apply to the red, green, and blue components. The inner most statements would of course be re-written to perform whatever it is you need to have done.

For the case of drawing solid vertical lines into RGB images, there are two cases: a 2 byte copy followed by a 1 byte copy, and the reverse (1 byte followed by 2 byte):

case 1:

  uint16 rg; // red and green portions of the color to copy
  uint8 b; // blue portion of the color to copy

  // inner loop
  *((uint16*)dest) = rg;
  dest += 2;
  *dest++ = b;

case 2:

  uint8 r; // red portion of the color to copy
  uint16 gb; // blue and green portion of the color to copy

  // inner loop
  *dest++ = r;
  *((uint16*)dest) = gb;
  dest += 2;

You must choose the appropriate version in order to make the short assignments WORD aligned (2 bytes) or else you could get slowdowns caused by faults.


#13

Actually, I think that if there would be just one fast function for setting a lot of pixels in different colours, it would be enough for a lot of cases. I personally could live with it. But it should better be damn fast.


#14

There is a good argument for adding a method/object that grabs an area of pixels to read and/or write, in the same way the Image::BitmapData class works. That could be made portable, because it’d either return a direct pointer to memory for the software renderer, or could allocate some temporary memory in the case of hardware-accelerated contexts.


#15

Two things come to mind:

  1. It would also be useful to have a way to know if the software renderer was being invoked, because doing bit-fiddling on a hardware accelerated context instead of using the Graphics members could actually be slower

  2. It is still useful to add a layer of indirection to the component peer implementations so that the developer can provide their own function for creating the graphics context used to repaint the window (or even more control over the redrawing process, to allow updating of multiple rectangles concurrently).


#16

TheVinn’s code looks fast and would be very optimizable if it were a template (so fr, fg, and fb could be inlined).

Another good candidate for the as-yet unborn JPAN (Juce Programming Archive Network).

Impossible, I’m hired by a company to do this and they surely do not pay me to share their code.

My boss lets me contribute all the code I like, because the comments I get back improve the quality immeasurably and that’s worth more to him than some theoretical lost value because competitors might use little snippets of our code!

My boss is such a good guy, I’m going to get him a cup of coffee right now… or perhaps I’ve already had a bit too much coffee… :wink:


#17

Well it was a sort of pseudo-code. You substitute your pixel operation in for fr, fg, and fb.


#18

This is EXACTLY what I was asking for last year with the vertical blends! LOL…


#19

Well I think if Jules adds that, the problem is solved.