OpenGLContext::attachment is nullptr


#1

I have a bit of an involved issue, so I will try to be as short an concise as possible.

What I am doing:

Making an automated test tool for our own rendering SDK. This involves the following steps:

1. Retreive a set of test cases from an internal server.

2. Spawn one or more custom juce::DocumentWindow instances. The rendering component is a custom juce::OpenGLAppComponent. For each window, test cases are run (i.e. rendered) for some time, while test results are built and returned to the server.

3. When all cases are done, delete all windows on the Juce Message thread.

4. Go to 1 if more test cases exist.

This means that I am creating openGL contexts and deleting them over and over.

What doesn't work:

Well, everything seems to be working fine, except that when this process is repeated for some time (around 10-15 times, sometimes before), I run into some issues with the OpenGL context. It is an jassert that is fired in OpenGLContext::getAssociatedObject():


ReferenceCountedObject* OpenGLContext::getAssociatedObject (const char* name) const
{
    jassert (name != nullptr);

    CachedImage* const c = getCachedImage();

    // This method must only be called from an openGL rendering callback.
    jassert (c != nullptr && nativeContext != nullptr);  // <---------------- Assertion error here: c is nullptr
    jassert (getCurrentContext() != nullptr);

    const int index = c->associatedObjectNames.indexOf (name);
    return index >= 0 ? c->associatedObjects.getUnchecked (index) : nullptr;
}

The CachedImage c is returned as a nullptr from getCachedImage(), hence the jassert fails. The nativeContext is not a nullptr. Diving further into the code, I find that the member private variable "ScopedPointer<Attachment> attachment" in the OpenGLContext class is a nullptr. As far as I am able to deduce, this is only set to nullptr in OpenGLContext::detach(). I cannot see how this could be the case, however. I have ensured that render() is not called after the test cases are done, so that deleting the windows should be straightfoward. All of this happens on an "OpenGL Rendering" thread.

What could the error be? In what places should I look for more details?

Here is the call stack:

mQATool.exe!juce::OpenGLContext::getAssociatedObject(const char * name) Line 789
mQATool.exe!`juce::OpenGLContext::copyTexture'::`7'::OverlayShaderProgram::select(juce::OpenGLContext & context) Line 851
mQATool.exe!juce::OpenGLContext::copyTexture(const juce::Rectangle<int> & targetClipArea, const juce::Rectangle<int> & anchorPosAndTextureSize, int contextWidth, int contextHeight, bool flippedVertically) Line 929
mQATool.exe!juce::OpenGLContext::CachedImage::drawComponentBuffer() Line 279
mQATool.exe!juce::OpenGLContext::CachedImage::renderFrame() Line 191
mQATool.exe!juce::OpenGLContext::CachedImage::run() Line 359
mQATool.exe!juce::Thread::threadEntryPoint() Line 106
mQATool.exe!juce::juce_threadEntryPoint(void * userData) Line 114
mQATool.exe!juce::threadEntryProc(void * userData) Line 103
[External Code]    
[Frames below may be incorrect and/or missing, no symbols loaded for msvcr120d.dll]

#2

Here is some more info after digging further into the details:

1. When the assert "jassert (c != nullptr && nativeContext != nullptr);" fails due to c being a nullptr, it is always during the deletion of a DocumentWindow instance. The deletion is done by iterating a "std::map<int, ScopedPointer<QAWindow>>" containing pointers to the various DocumentWindows and doing "it->second = nullptr;" and clearing the map after the iteration.

2. When deleting the windows, I always end up in the jassert "jassert (attachment == nullptr); // Running on an old graphics card!" in "juce_OpenGLContext.cpp" because "areShadersAvailable()" returns false. This is also because the "ScopedPointer<Attachment> attachment" is a nullpointer.

3. The error occurs seemngly at random. I can open, render and delete my DocumentWindows 3-4 times before the assert fails or sometims around 20-25 times. Maybe there is some threading involved, I am not sure.

4. I was not running the most updated version of the code (my code was fetched in April), so I updated to the latest, but the issue is the same. There were only minor changes to the Juce OpenGL code, so no surprise there.


#3

This seems like some locking mechanism is failing. The assert is clearly happening on the render thread and the render thread shouldn't even enter the render methods if a major delete operations are happening on the message thread.

Your last comments are confusing me a bit: 

You say that the assertion always happens during the deletion of the DocumentWindow. Do you mean that the main thread's call stack is inside the delete code of the DocumentWindow? Can you send me a stack trace of that thread as well?

In your second comment you speak of another assertion: jassert (attachment == nullptr); When and where is this happening? I think it is probably unwise to continue to run your app after you have hit an assertion in the OpenGL render stuff, so which one are you hitting first?


#4

This seems like some locking mechanism is failing. The assert is clearly happening on the render thread and the render thread shouldn't even enter the render methods if a major delete operations are happening on the message thread.

Yes, the assert is happening on the OpenGL render thread. Below is the code where i delete the DocumentWindows in case I am doing something wrong. The "mQAWindows" member variable is of datatype "std::map<int, ScopedPointer<QAWindow>>", where "QAWindow" is the custom "DocumentWindow". This map just keeps pointers to the windows so that they may be deleted properly.

void QAHandler::destroyAllQAWindows()
{
    const ScopedLock sl(mLock);
    std::function<void(void)> destroyAllQAWindowsLambda = [this]() { this->destroyAllQAWindowsOnJuceMessageThread(); };
    juce::MessageManager::callAsync(destroyAllQAWindowsLambda);
}

// Must be called on the Juce message thread.
void QAHandler::destroyAllQAWindowsOnJuceMessageThread()
{
    const ScopedLock sl(mLock);
    for (std::map<int, ScopedPointer<QAWindow>>::iterator it = mQAWindows.begin(); it != mQAWindows.end(); ++it)
    {
        it->second = nullptr;
    }
    mQAWindows.clear();
    
    startNextQARun(); // Starts the next test case, if any, thereby creating new DocumentWindows.
}

And just to show the whole picture. The windows are created like this on the message thread:

for (int i = 0; i < qaRun.QAComposites.size(); i++)
{
    juce::OptionalScopedPointer<QAWindow> window = juce::OptionalScopedPointer<QAWindow>(new QAWindow(1000, 800, this, qaRun, i), false);
}

At OpenGL context creation, the window calls back to add a reference. It is done like this because our SDK creates a unique ID (compositeId) for each OpenGL context to know where to render to. Our rendering composite is created in the window just after the OpenGL context is created (in OpenGLAppComponent::initialise()).

void QAHandler::onCreateComposite(const int& compositeId, const std::string& compositeName, juce::OptionalScopedPointer<QAWindow> window)
{
    const ScopedLock sl(mLock);
    mCompositeIdFinishedMap[compositeId] = false;
    mQAWindows[compositeId] = window; // <--------------- Here is the reference to the window kept until deletion.
    mCompositeIdNameMap[compositeId] = compositeName;
}

You say that the assertion always happens during the deletion of the DocumentWindow. Do you mean that the main thread's call stack is inside the delete code of the DocumentWindow? Can you send me a stack trace of that thread as well?

Yes, the Juce Message Thread calls the destructor of the DocumentWindow (customly named QAWindow in my code). The DocumentWindow further has a "ScopedPointer<QAView>" member, where the QAView class inherits from "OpenGLAppComponent" to do the actual rendering.

Here is the destructors and the corresponding stack trace of the when deleting the DocumentWindow on the Juce Message Thread. The destructor of the OpenGLAppComponent/QAView is called from the destructor of the DocumentWindow/QAWindow.

QAView::~QAView()
{
    shutdownOpenGL();
}
QAWindow::~QAWindow()
{
    mQAView = nullptr;
}
mQATool.exe!QA::QAView::~QAView() Line 22
[External Code]    
mQATool.exe!juce::ContainerDeletePolicy<QA::QAView>::destroy(QA::QAView * object) Line 48
mQATool.exe!juce::ScopedPointer<QA::QAView>::operator=(QA::QAView * const newObjectToTakePossessionOf) Line 141
mQATool.exe!QA::QAWindow::~QAWindow() Line 24
[External Code]    
mQATool.exe!juce::ContainerDeletePolicy<QA::QAWindow>::destroy(QA::QAWindow * object) Line 48
mQATool.exe!juce::ScopedPointer<QA::QAWindow>::operator=(QA::QAWindow * const newObjectToTakePossessionOf) Line 141
mQATool.exe!QA::QAHandler::destroyAllQAWindowsOnJuceMessageThread() Line 347
mQATool.exe!QA::QAHandler::destroyAllQAWindows::__l3::<lambda>() Line 336
[External Code]    
mQATool.exe!juce::AsyncFunction::messageCallback() Line 141
mQATool.exe!juce::WindowsMessageHelpers::dispatchMessageFromLParam(long lParam) Line 49
mQATool.exe!juce::MessageManager::dispatchNextMessageOnSystemQueue(bool returnIfNoPendingMessages) Line 110   
mQATool.exe!juce::MessageManager::runDispatchLoopUntil(int millisecondsToRunFor) Line 99
mQATool.exe!juce::MessageManager::runDispatchLoop() Line 87
mQATool.exe!juce::JUCEApplicationBase::main() Line 244
mQATool.exe!WinMain(HINSTANCE__ * __formal, HINSTANCE__ * __formal, char * __formal, int __formal) Line 58
mQATool.exe!main(int argc, char * * argv, char * * envp) Line 71
[External Code]
[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]

In your second comment you speak of another assertion: jassert (attachment == nullptr); When and where is this happening? I think it is probably unwise to continue to run your app after you have hit an assertion in the OpenGL render stuff, so which one are you hitting first?

This is an assert at the way bottom of "juce_OpenGLContext.cpp", in the function "void OpenGLContext::copyTexture". I am not really sure why I end up in a function copying a texture, as the textures I am rendering is solely handled by our SDK. Maybe Juce uses something in addition for the rendering (?). Anyway, the code just asserts that the attachment is a nullptr, which it is, and continues running.

I copied the main parts of the function in below. It is "areShadersAvailable()" that returns false, thus I end up in the assert where it says "Running on an old graphics card!", which is clearly not the case (I am running a GeForce GTX 670). The reason why "areShadersAvailable()" returns false is because "getCachedImage()" returns a nullptr inside that function. (As a sidenote I am setting a flag "minGLVersion = 2" in our SDK to enable shaders, but that should not matter. The rendering is in itself ok.)

To be as clear as possible:

a) The assert for old graphics cards is sometimes hit when deleting the window (I said always previously, but that is actually not the case). This assert does not fail and the window is deleted thereafter.
b) When the deletion fails, i.e. when "getCachedImage()" returns a nullptr in "OpenGLContext::getAssociatedObject()", this assertion fails before the old graphics card assertion is hit, so it is here the problem seemingly lies. (I am pretty sure that it hits before the other assert, but that assert isn't always hit, so I am not 100 % certain.)

void OpenGLContext::copyTexture (const Rectangle<int>& targetClipArea,
                                 const Rectangle<int>& anchorPosAndTextureSize,
                                 const int contextWidth, const int contextHeight,
                                 bool flippedVertically)
{
    // <snip>
    if (areShadersAvailable())
    {
        // <snip>
    }
    else
    {
        jassert(attachment == nullptr); // Running on an old graphics card!
    }
    JUCE_CHECK_OPENGL_ERROR
}

Additional issue (probably related):

I also randomly ran into another issue which is probably related. When starting a new test run, during the creation of the DocumentWindows, I hit an assert in "juce_OpenGL_win32.h", see below. This assert just checks if the context is active, which it was not at this point. I also copied in the stack trace for this. I have only hit this assert once

bool setSwapInterval (int numFramesPerSwap)
{
    jassert (isActive()); // this can only be called when the context is active..
    return wglSwapIntervalEXT != nullptr && wglSwapIntervalEXT (numFramesPerSwap) != FALSE;
}

Stack trace for this assert:

mQATool.exe!juce::OpenGLContext::NativeContext::setSwapInterval(int numFramesPerSwap) Line 97
mQATool.exe!juce::OpenGLContext::CachedImage::initialiseOnThread() Line 392
mQATool.exe!juce::OpenGLContext::CachedImage::run() Line 345
mQATool.exe!juce::Thread::threadEntryPoint() Line 106
mQATool.exe!juce::juce_threadEntryPoint(void * userData) Line 114
mQATool.exe!juce::threadEntryProc(void * userData) Line 103
[External Code]    
[Frames below may be incorrect and/or missing, no symbols loaded for msvcr120d.dll]    


#5

Just a short comment before you start using lots of time on this issue: I want to check if there is a race condition involved. If detach() is called from the main thread, setting the attachment variable to a nullptr while the OpenGL Rendering thread is still trying to render, the assertion may falsely kick in. I am currently checking it, but unfortunately (?) the assertion fail hasn't kicked in for almost an hour of running.


#6

I just realized that you are using a ScopedPointer in std::map. This is generally not recommended and will definitely have unexpected results if you are not using C++11. Try deriving your class from ReferenceCounterObject and then use QAWindow::Ptr as the values of your std::map. Or use OwnedArray if possible.


#7

Thanks Fabian, I will refactor a bit to use QAWindow::Ptr in my std::map. I'll update when I have tried this.

In the meantime: I see that OpenGLContext::detach() is called several times when deleting the QAWindows, usually 3 times for each window I have open. When I create the windows, OpenGLContext::detach() is only called once, which makes sense. As far as I am able to tell, all of these calls are done on the Juce Message Thread. Is this the expected behavior? When hitting the "old graphics card" assertion I am also on the Juce message thread.

(Where/how can I set some of my text to a monospaced font btw?)


#8

This could very well be related to you using ScopedPtr in the std::map. I wouldn't be surprised if the destructor of your window would be called multiple times on the same object.


#9

Ok, I believe the issue is as follows:

  1. I make an async call from the OpenGL Rendering Thread to the Juce Message Thread to delete all windows.
  2. The Juce Message Thread picks this up at some point and starts deleting the windows.
  3. If we are unlucy, i.e. when my program crashes, the Juce Message Thread has called detach() while the OpenGL Rendering Thread is in a place where it has already verified that the Attachment is available, which it is suddenly not, hence it crashes.

So what I believe I am experiencing is:

 A. If the Juce Message Thread pokes in and calls detach() in a proper place, everything works as expected. This is what usually happens.
 B. If the Juce Message Thread pokes in and calls detach() somewehere between OpenGLContext::renderFrame() and the areShadersAvailable() check in OpenGLContext::copyTexture(), we will end up in the else statement there, and at the assert saying "Running on an old graphics card". This is weird, but the window is properly deleted. This happens sometimes.
 C. If the Juce Message Thread pokes in and calls detach() after the areShadersAvailable() check on the OpenGLRendering Thread, it will fail when OpenGLContext::getAssociatedObject calls getCachedImage() and gets a nullptr. This happens once in a while.

This means that the call to OpenGLContext::detach() must be done in a thread safe way. I am not sure what is the best way to cope with this, be it locks, an extra state variable or whatever.

Does this make sense? Is it possible to make a call to the OpenGLContext to make it finish/pause rendering before deleting them from the Juce Message Thread?

 

Just some more details i put in, just to follow up the previous discussion and to fill in the blanks:

This could very well be related to you using ScopedPtr in the std::map. I wouldn't be surprised if the destructor of your window would be called multiple times on the same object.

The destructor of the QAWindow and the QAView (= OpenGLAppComponent) are only called once per window. When the QAView destructor is called, detach() is called in the three following ways, in the following order.

1. Through OpenGLAppComponent::shutdownOpenGL(), which calls detach() directly:

mQATool.exe!juce::OpenGLContext::detach() Line 689
mQATool.exe!juce::OpenGLAppComponent::shutdownOpenGL() Line 47
mQATool.exe!QA::QAView::~QAView() Line 27

2. Through the OpenGLAppComponent destructor, which calls shutdownOpenGL as above:

mQATool.exe!juce::OpenGLContext::detach() Line 689
mQATool.exe!juce::OpenGLAppComponent::shutdownOpenGL() Line 47
mQATool.exe!juce::OpenGLAppComponent::~OpenGLAppComponent() Line 42
mQATool.exe!QA::QAView::~QAView() Line 27

3. Through the OpenGLContext destructor, which is called straight after the OpenGLAppComponent destructor:

mQATool.exe!juce::OpenGLContext::detach() Line 689
mQATool.exe!juce::OpenGLContext::~OpenGLContext() Line 617
mQATool.exe!juce::OpenGLAppComponent::~OpenGLAppComponent() Line 42
mQATool.exe!QA::QAView::~QAView() Line 27

The destructor of QAView does nothing but call shutDownOpenGL:

QAView::~QAView()
{
    shutdownOpenGL();
}

I checked the default OpenGL Application made from the Introjucer and the behavior is the same there, so I guess this part can be ruled out.


#10

The call to opengl detach is already done in a thread-safe way in JUCE. As you say, the relevant call is not the detach itself but setting the cachedImage to a null pointer. This is done in juce_OpenGLContext.cpp:595, but just before that line it will call stop which will ensure that the OpenGL render thread has stopped succesfully. Therefore, the cachedImage should never suddenly be null if the opengl thread is currently rendering.

Obviously, we might have a more subtle bug in that part of the code but before we investigate this any further, can you confirm that you have refactored your code to not use ScopedPtr in the std::map anymore? This will cause all sorts of undefined side-effects - of which the assertion might well be one of them.


#11

Yes, the code is refactored to use std::map<int, QAWindow::Ptr> as you described above, where Ptr is a typedef of juce::ReferenceCountedObjectPtr<QAWindow>. I also checked that I do not have any other maps with such pointers. Actually, I will probably refactor that map away, but that's another story.

I set up a new debugging session to re-confirm that the application still crashes (it still does, after some time). If you are going to set up a test case for this, the main logic of the program is:

1. Create one or more windows (DocumentWindow) with an OpenGLAppComponent for rendering.
2a) Render stuff in each window.
2b) Render something else.
2c) Render something else.
...
3. When everything is finished rendering, i.e. all windows has rendered what they were supposed to, call asynchronously back to the Juce Message Thread and delete all existing windows (currently reference in the std::map) from there.
4. Return to 1 if there are more cases to render.

I guess it is pretty far from the normal use case to spam windows and OpenGL contexts like this, but maybe there is a subtle bug in there somewhere.


#12

Just a comment before the weekend. I did some refactoring so that the pointers (QAWindow::Ptr) to the QAWindows are no longer in an std::map, but in an std::vector upon creation. These pointers are then deleted from the vector at some point in the future. I have created and deleted windows for almost 4 hours now and it hasn't crashed yet. I will set it up to run during the weekend to see if the problem is (seems to be) gone, and let you know.


#13

I ran the application during the weekend and it ran fine from Friday 4 pm until Saturday 6 am before it crashes, with the same issue;
1. The Juce Message Thread deletes the QAWindow::Ptr from the std::vector and calls detach().
2. Straight after this, the OpenGLContext gets a nullptr when calling getCachedImage() in getAssociatedObject() and the app crashes.

I am tempted to just return a nullptr in case getCahcedImage() returns a nullptr, but I don't really know the consequences of doing this. It looks like returning a nullptr is a possibility in the flow of the code:

return index >= 0 ? c->associatedObjects.getUnchecked (index) : nullptr;

As mentioned, this is not a common use case, so the normal user will not run into this bug (if there is a bug) very often. I also noticed that I probably have a memory leak somewhere in my code as the memory consumption of the application had risen from ~60-70 MB on Firday to ~180 MB on Saturday morning. I don't know if this is related, but I will investigate this as well.