Memory corruption in ListenerList class

Hello,

There seems to be a memory corruption bug in the ListenerList class.

In ListenerList::callCheckedExcluding() there are these lines of code:

Iterator it{};
it.end = localListeners->size();
iterators->push_back (&it); // "it" is on the stack :(

This is taking the address of the stack variable Iterator “it” and storing it in the “iterators” list. As soon as the function exits, the pointer to “it” is no longer valid so “iterators” is left with a dangling pointer in the list.

Eventually, a call to ListenerList::clear() will stomp some other variable on the stack when it accesses the invalid “it” pointer:

for (auto* it : *iterators)
    it->end = 0; // "it" points to an invalid location on the stack :(

I have been seeing random memory corruption crashes in my app due to this bug.

Thanks!

You missed the following lines, which are important:

Iterator it{};
it.end = localListeners->size();

iterators->push_back (&it);

const ScopeGuard scope { [i = iterators, &it]
{
    i->erase (std::remove (i->begin(), i->end(), &it), i->end());
} };

The ScopeGuard removes references to ‘it’ from ‘iterators’ when the current function returns, so it looks to me like iterators shouldn’t ever contain any pointers to expired stack variables.
I also think this is unlikely to be the cause of the crashes you’re seeing because we haven’t had any reports of similar crashes. If something as fundamental as ListenerList were regularly causing crashes, we probably would have had several reports.

I recommend testing your project with Address Sanitizer and Thread Sanitizer. These will give you more descriptive error messages if there are address- or thread-related errors in the program.

If you are confident that this is a bug, please provide a minimal example that demonstrates the problem, e.g. with Address Sanitizer, and we can investigate.

Oops, you’re right!

I believe I’ve tracked down why the ListenerList is being corrupted.

In this case, I’m calling URL::readEntireTextStream() and seeing random crashes / memory corruption with JUCE 7.0.11 on iOS.

It seems that the issue is a race condition between URLConnectionState::~URLConnectionState() and URLConnectionState::DelegateClass::didCompleteWithError().

What appears to be happening is that the callback didCompleteWithError() hasn’t finished running on the OS callback thread before ~URLConnectionState() starts running on the user thread.

void didComplete (NSError* error)
    {
        const ScopedLock sl (dataLock);

	// ...extra lines removed…
        
        signalThreadShouldExit(); // <- Instruction pointer is here at time of crash (on OS callback thread)
    }
~URLConnectionState() override
    {
        signalThreadShouldExit(); // <- Instruction pointer is here at time of crash (on pool thread)

    // ...extra lines removed...
    }

Perhaps the fix is for ~URLConnectionState() to take the ‘dataLock’ before calling signalThreadShouldExit()?

As a side note, the reason why ListenersList popped up as a crash site is because Thread::signalThreadShouldExit() notifies listeners.

void Thread::signalThreadShouldExit()
{
    shouldExit = true;
    listeners.call ([] (Listener& l) { l.exitSignalSent(); });
}

My guess is that listeners might remove themselves during notification which modifies the list? That would explain why the list gets corrupted when the function becomes reentrant.

It seems that the issue is deeper than I realized.

~URLConnectionState() doesn’t wait for the task to complete, because [task cancel] isn’t a blocking call, and neither is [session finishTasksAndInvalidate].

According to the address sanitizer, the delegate callback from the OS can happen after ~URLConnectionState() has returned and the memory has been freed.

Perhaps this could be handled by having the callback signal a semaphore to let the destructor know it has completed?

This seems quite similar to this:

1 Like

@masshacker, yes that does look like the same bug. It’s a data race so it can be quite hard to reproduce.

Here’s what Address Sanitizer has to say about the crash site:

AddressSanitizer: heap-use-after-free on address 0x000123a3d048 at pc 0x00010510b2b0 bp 0x00016b291e10 sp 0x00016b291e08
READ of size 1 at 0x000123a3d048 thread T1
    #0 0x10510b2ac in juce::URLConnectionState::didComplete(NSError*)+0x29c (/private/var/containers/Bundle/Application/FDC30A7A-DDAE-4531-9405-6FF374BCAAFD/JUCETest.app/JUCETest:arm64+0x1005172ac)
    #1 0x1051016a8 in juce::URLConnectionState::DelegateClass::didCompleteWithError(objc_object*, objc_selector*, NSURLConnection*, NSURLSessionTask*, NSError*)+0x90 (/private/var/containers/Bundle/Application/FDC30A7A-DDAE-4531-9405-6FF374BCAAFD/JUCETest.app/JUCETest:arm64+0x10050d6a8)

(...rest of stack trace removed...)

It says the problem is heap-use-after-free in juce::URLConnectionState::didComplete().

The line of code that it crashes on is accessing the member variable URLConnectionStateBase::isBeingDeleted

void didComplete (NSError* error)
{
    // ...code removed for clarity...

    if (isBeingDeleted)
        return;

    // ...code removed for clarity...
}

Now let’s look at who freed the URLConnectionStateBase object:

0x000123a3d048 is located 712 bytes inside of 720-byte region [0x000123a3cd80,0x000123a3d050)
freed by thread T8 here:
    #0 0x107ea622c in wrap__ZdlPv+0x74 (/private/var/containers/Bundle/Application/FDC30A7A-DDAE-4531-9405-6FF374BCAAFD/JUCETest.app/Frameworks/libclang_rt.asan_ios_dynamic.dylib:arm64e+0x6222c)
    #1 0x1050fbf3c in juce::URLConnectionState::~URLConnectionState()+0x64 (/private/var/containers/Bundle/Application/FDC30A7A-DDAE-4531-9405-6FF374BCAAFD/JUCETest.app/JUCETest:arm64+0x100507f3c)

(...rest of stack trace removed...)

It says that the allocation was freed by URLConnectionState::~URLConnectionState(), which is the place you would expect.

One of the trickiest aspects of understanding this bug is the realization that [session finishTasksAndInvalidate] isn’t a blocking call. :slight_smile:

Here’s what the docs say:

“-finishTasksAndInvalidate returns immediately and existing tasks will be allowed to run to completion. New tasks may not be created. The session will continue to make delegate callbacks until URLSession:didBecomeInvalidWithError: has been issued.”

So it seems clear that it’s a data race between the delegate callback and the destructor.

I agree with you.

In addition, I think it also (can) happen that the delegate method is called after is released. Because the documentation says about finishTasksAndInvalidate:

This method returns immediately without waiting for tasks to finish. Once a session is invalidated, new tasks cannot be created in the session, but existing tasks continue until completion. After the last task finishes and the session makes the last delegate call related to those tasks, the session calls the URLSession:didBecomeInvalidWithError: method on its delegate, then breaks references to the delegate and callback objects. After invalidation, session objects cannot be reused.

I tried a way to “remove” the delegate from the session without success.

Reading the declaration of sessionWithConfiguration it explicitly says:

/*
 * Customization of NSURLSession occurs during creation of a new session.
 * If you only need to use the convenience routines with custom
 * configuration options it is not necessary to specify a delegate.
 * If you do specify a delegate, the delegate will be retained until after
 * the delegate has been sent the URLSession:didBecomeInvalidWithError: message.
 */

I tried several ways to change the order of these operations without success. The only way it seems working is changing the behavior of the destructor. Keeping the object URLConnectionState alive and waiting for the didBecomeInvalidWithError from the session, but I’m not sure this is the best approach.

This is a quick and dirty example: juce_Network_mac.mm (39.6 KB)

@matteo, nice work! That’s very similar to how I fixed it locally. The only real difference is that I used a waitable event so that it doesn’t have to do polling in the destructor.

The fact that the docs say “…the delegate will be retained…” seems to make it clear that there is no way to know that it has finished other than tracking it manually.

Here is the fix I’ve been testing locally.

In URLConnectionState::didComplete():

void didComplete (NSError* error)
{
    // Signal to ~URLConnectionState() that the object can be destructed
    ScopeGuard SignalCallbackCompleted{ [&]() { CallbackCompletionEvent.signal(); } };

    // ...
}

In URLConnectionState::~URLConnectionState():

~URLConnectionState() override
{
    // Wait for task to complete before destructing
    CallbackCompletionEvent.wait();

    // ...
}

In URLConnectionStateBase:

class URLConnectionStateBase : public Thread
{
    // ...

    // Ensures that OS callbacks are complete before destructing
    WaitableEvent CallbackCompletionEvent;

    // ...
}
1 Like