ProTools/AAX OpenGL nvoglv64.dll Crashes

The following makes JuceDemo not crash:

Add a "disable" method to OpenGLDemoClasses::Attributes

void disable (OpenGLContext& openGLContext)
{
    if (position != nullptr)
        openGLContext.extensions.glDisableVertexAttribArray(position->attributeID);
    if (normal != nullptr)
        openGLContext.extensions.glDisableVertexAttribArray(normal->attributeID);
    if (sourceColour != nullptr)
        openGLContext.extensions.glDisableVertexAttribArray(sourceColour->attributeID);
    if (texureCoordIn != nullptr)
        openGLContext.extensions.glDisableVertexAttribArray(texureCoordIn->attributeID);
}

and add a call to it after the call to glDrawElements in OpenGLDemoClasses::Shape::draw

attributes.enable (openGLContext);
glDrawElements (GL_TRIANGLES, vertexBuffer.numIndices, GL_UNSIGNED_INT, 0);
attributes.disable (openGLContext);

Not much of an OpenGL expert myself, but I hope this means something to someone, as this may solve JuceDemo's problem, but fixing JuceDemo won't fix the plugin crashes in Pro Tools which is what I really wanted to solve.

Cheers, Yair

Ah, excellent catch, thanks! Yes, I should have been cleaning up those attributes, I've added code to do so now!

I wonder if this (what JuceDemo problem was) gives us a hint about the original problem, AAX crashes.

It seems that the Nvidia OpenGL driver crashes in some cases of "lacking proper cleanup", and perhaps this causes the AAX crashes too.

7ser23: maybe a similar fix to the one in JuceDemo can fix the problem in your program (https://github.com/julianstorer/JUCE/commit/b7ebb273d930c5079efe9a4eaa23c623eb230c88). If it does, please let us know and perhaps it will provide another clue for the AAX crashes too.

plugin_wielder: can you please build a plugin with "#define JUCE_USE_OPENGL_SHADERS 0" and see if that might have solved the issue?

I'd do it myself but I don't have an Nvidia setup.. For the JuceDemo issue (which I hoped would solve the AAX problem) I went to a friend's house to debug, but I'd prefer not to install Pro Tools at a friend's computer to debug it..

I guess maybe I'll just send users a build to try..

We sent a test build with JUCE_USE_OPENGL_SHADERS=0 to an unsatisfied costumer which has an Nvidia setup and kindly agreed to help, and he reported that it solved the problem.

To get JUCE to compile in this mode two small fixes are required to make it compile - one's at https://github.com/julianstorer/JUCE/commit/875cb972178f1e747a1a78478bdf3419a068ff2a#commitcomment-4987437 and the other is fixing clearOpenGLGlyphCache() (for the purpose of the test I just disabled it rather than making it compile)..

I'm not 100% satisfied with this as a "solution". Ideally I'd find the offending code in the normal code path and fix it, but we don't have an Nvidia setup here to do that with and for us the problem is already "solved enough" with just using JUCE_USE_OPENGL_SHADERS=0.. I hope this post provides more clues to the source of the problem. I suspect that as in the case of the juce demo, it has to do with wrong cleanup of resources related to shaders.

I'd really appreciate if someone who does have an Nvidia setup to develop and debug with could continue this effort, for now I think that over here we'll just make builds with JUCE_USE_OPENGL_SHADERS=0 and the problem will be gone for us.

I think that Windows+Nvidia is a very important platform and it's only by bad luck that all of our machines over here have AMD or Intel GPUs. Apple too has been using Nvidia GPUs in many machines and we just happened to buy ours at "wrong" timings..

Cheers! Yair

Very frustrating! Since starting with GL I've been really surprised how many driver bugs there seem to be on Windows, when I'd have expected such heavily-used drivers to be more solid than they are.

FWIW my main machine's Nvidea, and I've never had a problem with it. However on my Win8 machine there was one version of the MS driver which crashed in a completely innocent bit of code (couldn't see a workaround or reason for it happening), whereas an Intel driver on the same machine worked perfectly, as did newer MS drivers.

I'd really appreciate if you could see if adding an innocent OpenGL view to the Juce demo plugin makes it crash on your machine (when closing the plugin window in PT).

Btw in our plugins we have Components drawn with OpenGL with setComponentPaintingEnabled(false) and setContinuousRepainting(true) on the OpenGLContext.

Cheers

It's a Mac, I'm afraid, so not really comparable.

I stopped watching this thread as my problem seemed to disappear. So I presumed some other code I wrote was causing an issue down the line somewhere. Sorry to have missed your questions yairadix and thanks for trying to find the problem.

For what its worth I was using both my laptop with a Nvidia NVS5400M and a desktop with Nvidia Quadro 600. I do have multi-screen setups and a mix of graphics processors going on.

For now my bugs seem to have disappeared so I put it down to some bad code, perhaps memory leaks, causing some problem. I set my graphics cards to disable some advanced features and selected it to work with the performance of 1 monitor. I'm not sure this did anything. On my laptop I have updated the driver to 331.82. I do concur that crashes happen when closing windows/plug-in.

Actually now I think about it, I had horrible crashes when closing the plug-in in the juce demo host on friday after updating JUCE to 3.0.1, which is why I updated the graphics driver. It actually makes the screen go pretty nuts (some areas of black and some flashing) when I first call the GUI window (across all screens not just the plug-in window area). My crash happens when I close the plug-in, not the window in this case. I'll try what yairadix proposed in this thread and report back.

I haven't tried PT11 (10 mysteriously does) yet because it no longer works on my main laptop machine, due to some forced windows updates and Avid not officially supporting Windows Enterprise and network log-on I can't do anything about it, so I now have yet another computer for plug-in development.

Thanks for keeping up with this Jules. This whole GL stuff is worrying since I can't predict the exact hardware a user may have, it all too tempting to rip out the GL stuff, but we really need it for some display elements.

Since I only have nvidia cards I don't know if this is directly related or a juce host only issue. as above I was getting crashes with my plug-in, but not the juce demo plug-in, when closing the plug-in. The fault occurs on:
FreeLibrary ((HMODULE) handle);

inside void DynamicLibrary::close() in the juce_win32_Threads.cpp. Commenting out stops the crash, but obviously the code is there for a reason, I'm guessing to free up resources going by the name. I haven't gone into that code, just tested if a comment out 'cured' it. When I debug the plug-in not the host, I get the crash showing up as the nvidia driver.

As a reminder - in the JUCE Demo the similar crashes were due to missing calls to glDisableVertexAttribArray.

I reviewed the calls to glDisableVertexAttribArray in juce_opengl, and noticed this:

The glDisableVertexAttribArray calls are made via (juce::OpenGLRendering::ShaderPrograms::) ShaderBase::unbindAttributes, which is being called at StateHelpers::CurrentShader::clearShader, which is being called in GLState::flush:

void flush()
{
    shaderQuadQueue.flush();
    currentShader.clearShader (shaderQuadQueue);
    JUCE_CHECK_OPENGL_ERROR
}

We unbind the attributes of the current shader when flushing, so we probably also need to unbind the attributes of previous shaders when changing shaders? Otherwise these may not be unbound if we use several shaders before we flush..

So, do we unbind attributes of previous shader when setting new one? Not in CurrentShader::setShader, GLState::setShader, nor does it seem that we do it in their callers.

It seems like a bug. Jules - I'd be happy if you could take a look!

Though on the other hand I'm not quite sure that our plugins, which use OpenGLContext::setComponentPaintingEnabled(false), would trigger these code paths anyhow, so perhaps it's not *the* bug I'm looking for..

Cheers, Yair

Very interesting - thanks, I'll take a look...

Interesting, thanks! Although it didn't seem to ever be causing problems for my drivers, I have tightened that up now, as there could be other drivers out there which had a problem with it.

Thanks for the quick fix! We've sent a version built with this fix for a user with the Nvidia setup to test out and we'll update when we know whether this solves the PT crashing issue..

By chance I have stumbled into another seemingly very relevant bug when debugging an unrelated issue with AAX.

I was debugging on a Mac (10.9.1), and not with an Nvidia, but with an Intel HD Graphics 4000, and JUCE_CHECK_OPENGL_ERROR at the end of juce::OpenGLContext::copyTexture popped up when closing the plugin, which is exactly the point where with Nvidia on windows it crashes with the very same plugin.

I'm suspecting this line:

extensions.glVertexAttribPointer (index, 2, GL_SHORT, GL_FALSE, 4, vertices);

Looks like the last argument to glVertexAttribPointer isn't supposed to actually be a pointer but rather an offset in a buffer/struct, which is what I understand from both documentation and code examples I could find.

Jules could you take a look? I'm oddly optimistic about this actually being the problem causing crashes in PT in Windows, of course I may be wrong..

Btw still no info from the user to whom we sent a build with the previous fix..

Cheers! Yair

I see your point, though if that had been wrong nothing at all should ever have worked at all.

Still, it was certainly using the old style of calling that function, and I've tweaked it now so that it uses a modern vertex buffer - would be keen to know if it helps!

Yeah you're right, it doesn't make sense that that would be the problem. Why would it only happen when closing the window? Indeed it acts the same after your change..

On the other hand I found that this code path only runs when "renderComponents" is enabled and that reminded me that I forgot to setComponentPaintingEnabled(false), as I don't need it in my view..

With setComponentPaintingEnabled(false) I don't get the error, and that's fine for me for now, and hopefully also helpful for finding the bug..

Thanks for the quick response! Yair

So we purchased a new laptop with a GeForce GPU so that we can reproduce, investigate and fix this problem..

At first the problem didn't reproduce, but then, I found that as it was a laptop with GeForce + Intel integrated graphics, I needed to go to the Nvidia Control Panel, and set it to use Nvidia both in global settings and in per application settings for Pro Tools. I believe that this step is unneccessary for PCs that have only an Nvidia GPU..

Then the problem successfully reproduced, but sadly it doesn't consistently crash.. if I try it for 20 times or so.. then I do get a crash.

The crash happened in nvoglv64 code while the OpenGL Rendering Thread was in a call to wglMakeCurrent, while the main thread was waiting for the rendering thread to finish in the destructor of my view calling the context's detach method.

I couldn't find anything fishy, but one thing that did come to my mind is that this specific call to wglMakeCurrent in the OpenGL Rendering Thread is probably not even necessary, as all this thread does pretty much is render using this context to my OpenGL component, so it probably doesn't even change contexts.. So if it's an unnecessary call that seems to cause the crashe, what happens if we avoid it? so in OpenGLContext::NativeContext::makeActive I added a check to see if it's already active, avoiding a call in that case..

bool makeActive() const noexcept
{
    if (isActive())
        return true;
    return wglMakeCurrent (dc, renderContext) != FALSE;
}

With this workaround (fix??) I couldn't produce the crashes anymore.

For now I'll send this version to some unsatisfied costumers to see if it resolves the problems for them too. Will update on that.

 

Another possibly interesting finding:

When trying to reproduce this problem on this new laptop, trying to figure out how to force it to use its Nvidia driver+gpu, before succeeding using the Nvidia Control Panel as described above, I tried disabling the driver for the Intel Integrated Graphics in the device manager, assuming that without the intel driver the machine will use the nvidia one.. (btw this doesn't seem like a really good state as the Nvidia Control Panel refuses to open giving a message that this computer does not use the nvidia gpu..)

In this state, when closing the plugin window, instead of crashing, it becomes stuck for 10 seconds.. the stopThread for the rendering thread fails the 10 seconds timeout and reaches the place in the code with comment "very bad karma if this point is reached".. Meanwhile the rendering thread is stuck at a loop in clearGLError() called from CachedImage::RenderFrame.

donno if this is useful data, not sure even how healthy is this state where the intel driver is disabled..

 

Any thoughts?

Cheers! Yair

Thanks for the detailed report and digging!

I've certainly no objection at all to avoidng the unnecessary wglMakeCurrent call - I'll add that right away. Even if it's not directly responsible for the driver crash, then it's still better to avoid it, as who knows what might happen inside there.

Good news - our customer tried the version we sent him and the problem is indeed solved for him too!

Cheers! Yair