Fastest way to render large number of ARGB images


#1

Hi everyone.

 

I'm starting to work on a project in Juce that will involve lots of 2D graphics (special-purpose game engine). I need to be able to render many (e.g. 10,000) small (e.g. 64x64) ARGB images (originally PNG, if that matters) to screen per frame at 60 FPS. The images may be flipped vertically or horizontally but they will not be scaled or clipped (by anything other than the window). If it turns out to be faster I can have it make pre-flipped copies of appropriate images--memory is not much of a concern at this point.

 

I'm running Ubuntu 14.4 64-bit with nVidia GTX 660 and the proprietary drivers, in case this matters, but I want the result to be pretty portable to anyone with a modern GPU on Windows and Mac too.

 

I made a little stress-test program with some code adapted from the 2D background behind the 3D teapot in the demo: a component that's also an OpenGLRenderer, which gets a LowLevelGraphicsContext and puts it in a Graphics object, but then the actual rendering is g.drawImageAt(etc). This runs fine with 1000 images per frame but lags to 6 FPS at 10,000 images per frame. It looks like the work is being done by the CPU, though.

 

I'm assuming that drawing images directly to the Graphics in Component::paint() won't be any faster than this. But is there any way to use the full power of my GPU to draw these images? Maybe putting each image on two triangles and rendering them as 3D? Custom vertex shaders? I don't have any previous experience with programming for OpenGL anywhere near directly, so I'm not sure what's available.


#2

Using the GL renderer will be by far the fastest way to render that kind of thing.

The teapot demo is probably not the best place to test it though - instead use one of the normal demo pages, but put it into GL rendering mode by pressing '2' or '3'. The Graphics class isn't tuned for this kind of task though, so 10000 might be pushing it a bit! Would be interested to know the bottlenecks if you performance-profile it.


#3

Hmm… when I run the demo, open the OpenGL demo page, switch to the OpenGL 2D demo page, switch back to OpenGL 3D, and then try to switch back to OpenGL 2D, the demo hangs and I have to kill the process. (Maybe I should post this somewhere else…) Other than that, normal rendering or OpenGL rendering doesn’t seem to affect the performance on any of the demo pages.

How about OpenGLContext::copyTexture()? Would that cause the GPU to actually be the one copying pixels to the framebuffer? I was having trouble getting that to work initially, but I think I was doing it in the paint thread instead of the OpenGLRenderer callback thread, so I’ll try it again. Edit: tried it again on the correct thread, still not working:

void MainContentComponent::newOpenGLContextCreated() override
{
    //...
    File f2("/valid/path/to/image.png");
    if(!f2.exists()){
        DBG("File2 load failed!");
        return;
    }
    Image im2(ImageFileFormat::loadFrom(f2));
    texture = new OpenGLTexture();
    texture->loadImage(im2);
    //...
}
void MainContentComponent::renderOpenGL() override
{
    //...
    texture->bind();
    DBG(String((int)texture->getTextureID())); //prints "1"
    Rectangle<int> targetClipArea(0, 0, 64, 64);
    Rectangle<int> anchorPosAndTextureSize(0, 0, 64, 64);
    for(int i=0; i<10; i++){
        targetClipArea.setPosition(rand.nextInt(getWidth()), rand.nextInt(getHeight()));
        openGLContext.copyTexture(targetClipArea, anchorPosAndTextureSize,
                                  contextWidth, contextHeight, false); //segfault on this line, if I comment it out program runs (but nothing drawn of course)
    }
    texture->unbind();
    //...
}

This might be a dumb question, but I can run large AAA games on my GPU with much more than 10,000 triangles per frame, and a lot more than one non-transformed texture per poly (bump/normal mapping, dynamic lighting, custom shaders, etc.) As a matter of fact, 10,000 triangles per frame is somewhere around the performance of the N64! So how do they do it? Or better, how do you set things up in Juce so that the GPU is actually doing this kind of work? If this is really a question about OpenGL and not Juce, I apologize, I can start learning from that side; but I’m not really sure where to start with Juce. Looking at the code for the 3D parts of the teapot demo, it looked like there was a good deal of code dealing directly with OpenGL, which I was a little surprised about, since I thought the idea was to use Juce API functions directly for everything.