Optimizing OpenGL Renderer for iOS

Hi All,

This conversation started in another thread about Component Animator, but since this ended up being a sort of general iOS optimization question I wanted to call it out on its own topic. I'm hoping someone with more iOS/GL experience will be able to recommend some improvements here.

Using the OpenGL Renderer in Juce on iOS is really really slow. A good way to see this is to:

- Open the Juce Demo on an iOS device (you must look an a physical device to really see how slow it is -- preferably Retina)
- Go to the Tabs & Widgets demo page. From the Look-and-Feel menu, choose Use OpenGL Renderer
- Switch to the Box2D demo. 

What you will see is animation happening at something like 5 frames per second. Ouch!

I am by no means an OpenGL expert, but in Instruments I saw that glBufferSubData seemed to be a bottleneck. So I did some poking around on game forums and tried a few things. In Juce_openGLGraphicsContext.cpp, in draw(), i replaced glBufferSubData with glBufferData, and that alone improved my frame rate by leaps and bounds (i'm not sure what hint to give it here, I tried dynamic, static and stream, didn't seem to make much difference):

context.extensions.glBufferData (GL_ARRAY_BUFFER, (GLsizeiptr) ((size_t) numVertices * sizeof (VertexInfo)), vertexData, GL_DYNAMIC_DRAW);

Still not what I would call smooth animation, but much better (it has improved component animations to the point where I would now ship my app with them, so that's a big improvement, but still not perfect). That's all well and good, but Jules notes that it is a bit counterintuitive that this would speed things up. 

Does anyone have any suggestions for a) why this helps and b) what other things could be slowing down OpenGL rendering so much on iOS? 



I'll add a couple of other (maybe) notable things from Instruments here. In 10 seconds of animating components, I get about 10,000 - 15,000 of the following warnings. I'd be interested to know from any of you who are more knowledgable whether these are actually something to investigate, or whether Instruments is just whining:

GPU Wait on Buffer:

Your application attempted to update to a buffer that is currently being used for rendering. This causes the CPU to wait for the GPU to finish rendering. One way to fix this problem is to use a double or ring buffered approach so that your application does not update a buffer that is used to as part of a drawing call. Another possibility is to reorder operations in a way that allows for the GPU to finish using the buffer before it is updated.

Logical Buffer Store:

OpenGL ES performed a logical buffer store operation. This is typically caused by switching framebuffer or not discarding buffers at the end of the rendering loop. For best performance keep logical buffer store operations to a minimum amd discard the depth buffer at the end of each frame. See the EXT_discard_framebuffer extension for more details..

Logical Buffer Load:

Your application caused OpenGL ES to perform a framebuffer load operation, where the framebuffer must be loaded by the GPU before rendering. This is typically caused by failing to perform a fullscreen clear operation at the beginning of each frame. If possible, you should do so to improve performance.

Redundant Call:

This command was redundant: glVertexAttribPointer(1u, 2, GL_SHORT, 0u, 8, nullptr)

A GL function call that sets a piece of GL state to its current value has been detected. Minimize the number of these redundant state calls, since these calls are performing unnecessary work.

I believe this issue has to do with not orphaning the buffer. Changing to glBufferData will work, but you can also do this:

        context.extensions.glBufferData (GL_ARRAY_BUFFER, (GLsizeiptr) sizeof (vertexData), NULL, GL_DYNAMIC_DRAW);
        context.extensions.glBufferSubData (GL_ARRAY_BUFFER, 0, (GLsizeiptr) ((size_t) numVertices * sizeof (VertexInfo)), vertexData);

Hitting the full buffer with NULL tells the GPU that you don’t need to hold onto the old memory. The frame rate drop is a stall between the CPU and GPU. By using buffer orphaning, you can get rid of that stall.

Here’s a paper on the topic: