Massive performance decrease in Software Rendering

TBH I feel a bit overwhelmed when it comes to changing this code. I can always try to help you, but my help will be pretty limited I suppose.

just a hint, changing the following code like this makes a big difference when you’re using pixelstride=4 for both ARGB & RGB images:

forcedinline void copyRow (DestPixelType* dest, SrcPixelType const* src, int width) const noexcept { if (srcData.pixelStride == 3 && destData.pixelStride == 3) { memcpy (dest, src, sizeof (PixelRGB) * (size_t) width); } else { do { dest->blend (*src); incDestPixelPointer (dest); src = addBytesToPointer (src, srcData.pixelStride); } while (--width > 0); } }

to this:

forcedinline void copyRow (DestPixelType* dest, SrcPixelType const* src, int width) const noexcept { if (srcData.pixelStride == destData.pixelStride) { memcpy (dest, src, srcData.pixelStride * (size_t) width); } else { do { dest->blend (*src); incDestPixelPointer (dest); src = addBytesToPointer (src, srcData.pixelStride); } while (--width > 0); } }

I also think that JUCE’s software renderer design, since based on templates, could be pushed a bit further, as AGG does. For instance, pixelstride could be a template parameter, and so could also be the indexes or R,G,B,A. This would allow more agressive use of templates and lead to much more optimized code. Using functions as addBytesToPointer() is not a good idea since there are too many operations involved to just increase the pointer by a certain amount.

Another problem I found if PixelRGB with pixelStride==4 is used:

forcedinline void replaceLine (PixelRGB* dest, const PixelARGB& colour, int width) const noexcept { if (data.pixelStride == sizeof (*dest)) { if (areRGBComponentsEqual) // if all the component values are the same, we can cheat.. { memset (dest, colour.getRed(), (size_t) width * data.pixelStride); // <------------------------------ previously * 3, which is wrong } else

Actually, this EdgeTableFillers::SolidColour has many ==3 checks to check if the PixelType is RGB or not. If one was to use a RGB PixelType with pixelStride=4, this checks wouldn’t work anymore. Which in turns is a proof that JUCE actually uses and must use fixed PixelTypes sizes right now.

pixelStride cannot be a template parameter since it is not known at compile time. It would have to be in a switch or if statement at a high level.

I’ve changed the Image::RGB format to be 32 Bit too, and changed some stuff here and there to adapt. I got from 60fps to about 250fps. I’ll try to create a new RGB24 type, so we have both RGB24 and RGB32, if possible. It runs actually faster than it ever did before now :slight_smile: If it’s clean enough, I’ll send it over to you for review.

I’ve nearly finished (cleanly) modifying the entire JUCE source code for supporting both Image::RGB24 and Image::RGB32. Please give me 12 more hours, and I’ll upload the new version somewhere so you and maybe Jules can review it. I must say in advance that XCode’s compiler seems to do a better job, because the FPS difference wasn’t as striking as on Windows. Nevertheless, drawing RGB images onto ARGB is now about 2x faster on Mac. Overall, the JUCE rendering demo also performs faster, but the increase in performance varies depending on what is rendered.

Thanks guys, this is all interesting and much appreciated, but glancing quickly through the posts I can see that the signal-to-noise ratio of useable ideas is pretty bad so far…

If you can boil this down to a TL;DR that actually makes sense and isn’t going to break anything, maybe start a new thread to explain it clearly, and I’ll take a look.

I hope your changes are limited to RenderingHelpers.h, it should not be necessary to change juce_Image.cpp

Here’s the modified code. I tried to do it as cleanly as possible. Basically there’s a new RGB32 Image type, and I adapted the JUCE code to it.
See changes for yourself what changed. There are precompiled bomaroes (original JUCE, latest tip vs modified JUCE) of the JUCE demo for both OS X and Windows (release builds). You can check the performance differences for yourself. Here they vary from 400% to + 0%. Especially RGB images are drawn much faster, and tiled images too. For the rest, you normally see a small performance advantage of 10-30%.

How is this different from ImageRGB with pixelStride==4 ?

Funny question, I thought your very idea was to offer both RGB24 and RGB32 optimized support? The old Image::RGB format was always using pixelstride 3, you couldn’t specify pixelstride=4, and if you changed the Image class code to use pixelstride=4, you wouldn’t just simply/magically enhance performance because the PixelFormatRGB class’ functions are specially written for source pixelstride=3 and not 4. For instance, the function getARGB() is typically faster in the RGB32 class (and could be even faster if one would make sure that the alpha would be always 255 in the image data).

Obviously, RGB32 is much better for use with ARGB as they both are 32Bit. You can use integers instead of bytes. If in the RGB32 image alpha would be 0xff in all cases, then one could also simply use memcpy to copy over to ARGB. You remember before this would be done via byte copy operations, one for R, one for G, one for B. Also check juce_PixelFormats.h where you now have PixelRGB24 (that’s the old PixelRGB) and PixelRGB32. The differences between PixelRGB24 and PixelRGB32 are big enough to enhance the performance already.

I suggest you try do some tests yourself (for instance, just changing the pixelstride of Image::RGB), and measure the performance vs. my code changes.

Yes but not by introducing new image types!

This is simply not true and I tried to warn you about making these kinds of changes (going outside juce_RenderingHelpers.h).

If you look at the ImageType class, you will see that it is entirely possible for an Image to be created with any arbitrary value for pixelStride. In fact, the Win32 native code does this, have a look at the WindowsBitmapImage class in juce_win32_Windowing.cpp.

Well, it's just one single more enum in the types. What's so bad about that? It's good to be able to make a distinction between RGB24 and RGB32.

[code]If you look at the ImageType class, you will see that it is entirely possible for an Image to be created with any arbitrary value for pixelStride. In fact, the Win32 native code does this, have a look at the WindowsBitmapImage class in juce_win32_Windowing.cpp.[/code]
IMHO you cannot use 1 single PixelType for both RGB24 and RGB32 at the same time, that's the big problem. It's mostly because of the getARGB() function, which can be simplified when using 4 bytes for rgb instead of 3 (and that also is a difference in the 2 pixel types). Now there *might* be some way around this, but I don't know of anyone. Hence I introduced the new PixelType PixelRGB32 in the code and taking care of it in the renderer. Further code changes I introduced is for instance using not ARGB on OSX, but RGB32 images as backbuffer, which is faster.

Well, it’s just one single more enum in the types. What’s so bad about that? It’s good to be able to make a distinction between RGB24 and RGB32.

IMHO you cannot use 1 single PixelType for both RGB24 and RGB32 at the same time, that’s the big problem. It’s mostly because of the getARGB() function, which can be simplified when using 4 bytes for rgb instead of 3 (and that also is a difference in the 2 pixel types). Now there might be some way around this, but I don’t know of anyone. Hence I introduced the new PixelType PixelRGB32 in the code and taking care of it in the renderer. Further code changes I introduced is for instance using not ARGB on OSX, but RGB32 images as backbuffer, which is faster.

It breaks any code which does a switch on the image type.

Yes, and the way that you do that is by checking pixelStride against 3 or 4.

It’s not the big problem that you think it is. I get the feeling that you were somewhat intimidated by all the templates and this was the only alternative you could come up with for fixing the performance issues but trust me, it could have all been done in juce_RenderingHelpers.h.

Jules said that this was not possible…Jules?

Not at all?! The original enums have the same values, there's just one added at the end. What I mean is that Image::RGB is still there in the same position, but just called Image::RGB24 now (and Image::RGB for compatibility).
If you don't want to add this enum, you'll have to tell the pixelstride for RGB at construction time (or how do you see this?).

[code]but trust me, it could have all been done in juce_RenderingHelpers.h.[/code]
I challenge anyone to come up with a solution that offers both RGB24 and RGB32 with same performance as my code, without any additional PixelType. I'd be the first to be happy about it.

Not at all?! The original enums have the same values, there’s just one added at the end. What I mean is that Image::RGB is still there in the same position, but just called Image::RGB24 now (and Image::RGB for compatibility).
If you don’t want to add this enum, you’ll have to tell the pixelstride for RGB at construction time (or how do you see this?).

I challenge anyone to come up with a solution that offers both RGB24 and RGB32 with same performance as my code, without any additional PixelType. I’d be the first to be happy about it.

You do understand that in the current JUCE tip, you can produce Image objects with any pixelStride you want, using this constructor:

Image (PixelFormat format, int imageWidth, int imageHeight, bool clearImage, const ImageType& type);

and that JUCE currently allocates RGB images with both pixelStride==3 and pixelStride==4 (juce_win32_Windowing.cpp line 216)?

Are you not reading my posts?

I said it a few times now, what needs to be done is add a test for pixelStride at a higher level and then dispatch the appropriate hand rolled routines. You’re doing this based off of your new image type, but it could have been done by just testing pixelStride as in this code snippet I posted before:

template <class Iterator, class DestPixelType>
void renderSolidFill (Iterator& iter, const Image::BitmapData& destData, const PixelARGB& fillColour, const bool replaceContents, DestPixelType*) {
    if (replaceContents)
    {
        if (destData.pixelStride==3) {
          EdgeTableFillers::SolidColour3 <DestPixelType, true> r (destData, fillColour);
          iter.iterate (r);
        else if (destData.pixelStride==4) {
          EdgeTableFillers::SolidColour4 <DestPixelType, true> r (destData, fillColour);
          iter.iterate (r);
        }
        else {
          EdgeTableFillers::SolidColour <DestPixelType, true> r (destData, fillColour);
          iter.iterate (r);
        }
    }
//...

You could have saved yourself a lot of wasted effort if you had listened…

[code]and that JUCE currently allocates RGB images with both pixelStride==3 and pixelStride==4 (juce_win32_Windowing.cpp line 216)?

Are you not reading my posts?[/code]

Sure, this is possible too, but kinda far-fledged for a simple operation.

Not a big fan of the solution you’re proposing. I think you’ll end up with a far less cleaner code. Anyway, I’m looking forward to see what Jules will come up with - please feel free to test the performance against my proposed code.

Without getting into tech, just from a very outside view: zamrate mentions - and proves - that there is a massive slowdown in software rendering, and both zamrate and TheVinn spend tons of his time in arguing and posting examples, and Jules asks them to create a better post, to basically do his work, because he is too busy to have a close look on his own.

But Jules is the one selling the framework, so he should address it, to either fix it or prove that there is no real problem. Jules, if you cannot because of priorities and if this is too low of priority, then what I can read from this is that there is a risk for us using JUCE.

Why not hire additional developers and charge more for JUCE (or charge for major updates)? I remember the problem with keyboard input when JUCE is used for audio plugins, which is unfixed for so long, so it really seems like a lack of development for a great framework.

Don’t get me wrong, we are super-happy with JUCE. But it seems that some important issues stay unfixed for very long, which gets me wondering if there is too little resources put into JUCE.

Jakob, if I hired developers to instantly implement everything that anyone requests, my R&D budget would be bigger than IBM’s.

Zamrate’s the only person who’s ever mentioned the software rendering performance, which is why this isn’t a massive priority for me. He’s also got a very specific use-case, and any improvements that I made based on the issues thread would probably be imperceptible to every other user.

try
{
Jules.getInstance().addItemsToWorkQueue(WorkItems items);
}
catch (JulesOverloadedException e)
{
Forum.getAnyThread().post(BitchingAroundMessage m);
throw(e);
}

Pretty much what i read from the last couple posts.