Memory corruption in ListenerList class

Hello,

There seems to be a memory corruption bug in the ListenerList class.

In ListenerList::callCheckedExcluding() there are these lines of code:

Iterator it{};
it.end = localListeners->size();
iterators->push_back (&it); // "it" is on the stack :(

This is taking the address of the stack variable Iterator “it” and storing it in the “iterators” list. As soon as the function exits, the pointer to “it” is no longer valid so “iterators” is left with a dangling pointer in the list.

Eventually, a call to ListenerList::clear() will stomp some other variable on the stack when it accesses the invalid “it” pointer:

for (auto* it : *iterators)
    it->end = 0; // "it" points to an invalid location on the stack :(

I have been seeing random memory corruption crashes in my app due to this bug.

Thanks!

You missed the following lines, which are important:

Iterator it{};
it.end = localListeners->size();

iterators->push_back (&it);

const ScopeGuard scope { [i = iterators, &it]
{
    i->erase (std::remove (i->begin(), i->end(), &it), i->end());
} };

The ScopeGuard removes references to ‘it’ from ‘iterators’ when the current function returns, so it looks to me like iterators shouldn’t ever contain any pointers to expired stack variables.
I also think this is unlikely to be the cause of the crashes you’re seeing because we haven’t had any reports of similar crashes. If something as fundamental as ListenerList were regularly causing crashes, we probably would have had several reports.

I recommend testing your project with Address Sanitizer and Thread Sanitizer. These will give you more descriptive error messages if there are address- or thread-related errors in the program.

If you are confident that this is a bug, please provide a minimal example that demonstrates the problem, e.g. with Address Sanitizer, and we can investigate.

Oops, you’re right!

I believe I’ve tracked down why the ListenerList is being corrupted.

In this case, I’m calling URL::readEntireTextStream() and seeing random crashes / memory corruption with JUCE 7.0.11 on iOS.

It seems that the issue is a race condition between URLConnectionState::~URLConnectionState() and URLConnectionState::DelegateClass::didCompleteWithError().

What appears to be happening is that the callback didCompleteWithError() hasn’t finished running on the OS callback thread before ~URLConnectionState() starts running on the user thread.

void didComplete (NSError* error)
    {
        const ScopedLock sl (dataLock);

	// ...extra lines removed…
        
        signalThreadShouldExit(); // <- Instruction pointer is here at time of crash (on OS callback thread)
    }
~URLConnectionState() override
    {
        signalThreadShouldExit(); // <- Instruction pointer is here at time of crash (on pool thread)

    // ...extra lines removed...
    }

Perhaps the fix is for ~URLConnectionState() to take the ‘dataLock’ before calling signalThreadShouldExit()?

As a side note, the reason why ListenersList popped up as a crash site is because Thread::signalThreadShouldExit() notifies listeners.

void Thread::signalThreadShouldExit()
{
    shouldExit = true;
    listeners.call ([] (Listener& l) { l.exitSignalSent(); });
}

My guess is that listeners might remove themselves during notification which modifies the list? That would explain why the list gets corrupted when the function becomes reentrant.

It seems that the issue is deeper than I realized.

~URLConnectionState() doesn’t wait for the task to complete, because [task cancel] isn’t a blocking call, and neither is [session finishTasksAndInvalidate].

According to the address sanitizer, the delegate callback from the OS can happen after ~URLConnectionState() has returned and the memory has been freed.

Perhaps this could be handled by having the callback signal a semaphore to let the destructor know it has completed?

1 Like

This seems quite similar to this:

1 Like

@masshacker, yes that does look like the same bug. It’s a data race so it can be quite hard to reproduce.

Here’s what Address Sanitizer has to say about the crash site:

AddressSanitizer: heap-use-after-free on address 0x000123a3d048 at pc 0x00010510b2b0 bp 0x00016b291e10 sp 0x00016b291e08
READ of size 1 at 0x000123a3d048 thread T1
    #0 0x10510b2ac in juce::URLConnectionState::didComplete(NSError*)+0x29c (/private/var/containers/Bundle/Application/FDC30A7A-DDAE-4531-9405-6FF374BCAAFD/JUCETest.app/JUCETest:arm64+0x1005172ac)
    #1 0x1051016a8 in juce::URLConnectionState::DelegateClass::didCompleteWithError(objc_object*, objc_selector*, NSURLConnection*, NSURLSessionTask*, NSError*)+0x90 (/private/var/containers/Bundle/Application/FDC30A7A-DDAE-4531-9405-6FF374BCAAFD/JUCETest.app/JUCETest:arm64+0x10050d6a8)

(...rest of stack trace removed...)

It says the problem is heap-use-after-free in juce::URLConnectionState::didComplete().

The line of code that it crashes on is accessing the member variable URLConnectionStateBase::isBeingDeleted

void didComplete (NSError* error)
{
    // ...code removed for clarity...

    if (isBeingDeleted)
        return;

    // ...code removed for clarity...
}

Now let’s look at who freed the URLConnectionStateBase object:

0x000123a3d048 is located 712 bytes inside of 720-byte region [0x000123a3cd80,0x000123a3d050)
freed by thread T8 here:
    #0 0x107ea622c in wrap__ZdlPv+0x74 (/private/var/containers/Bundle/Application/FDC30A7A-DDAE-4531-9405-6FF374BCAAFD/JUCETest.app/Frameworks/libclang_rt.asan_ios_dynamic.dylib:arm64e+0x6222c)
    #1 0x1050fbf3c in juce::URLConnectionState::~URLConnectionState()+0x64 (/private/var/containers/Bundle/Application/FDC30A7A-DDAE-4531-9405-6FF374BCAAFD/JUCETest.app/JUCETest:arm64+0x100507f3c)

(...rest of stack trace removed...)

It says that the allocation was freed by URLConnectionState::~URLConnectionState(), which is the place you would expect.

One of the trickiest aspects of understanding this bug is the realization that [session finishTasksAndInvalidate] isn’t a blocking call. :slight_smile:

Here’s what the docs say:

“-finishTasksAndInvalidate returns immediately and existing tasks will be allowed to run to completion. New tasks may not be created. The session will continue to make delegate callbacks until URLSession:didBecomeInvalidWithError: has been issued.”

So it seems clear that it’s a data race between the delegate callback and the destructor.

I agree with you.

In addition, I think it also (can) happen that the delegate method is called after is released. Because the documentation says about finishTasksAndInvalidate:

This method returns immediately without waiting for tasks to finish. Once a session is invalidated, new tasks cannot be created in the session, but existing tasks continue until completion. After the last task finishes and the session makes the last delegate call related to those tasks, the session calls the URLSession:didBecomeInvalidWithError: method on its delegate, then breaks references to the delegate and callback objects. After invalidation, session objects cannot be reused.

I tried a way to “remove” the delegate from the session without success.

Reading the declaration of sessionWithConfiguration it explicitly says:

/*
 * Customization of NSURLSession occurs during creation of a new session.
 * If you only need to use the convenience routines with custom
 * configuration options it is not necessary to specify a delegate.
 * If you do specify a delegate, the delegate will be retained until after
 * the delegate has been sent the URLSession:didBecomeInvalidWithError: message.
 */

I tried several ways to change the order of these operations without success. The only way it seems working is changing the behavior of the destructor. Keeping the object URLConnectionState alive and waiting for the didBecomeInvalidWithError from the session, but I’m not sure this is the best approach.

This is a quick and dirty example: juce_Network_mac.mm (39.6 KB)

@matteo, nice work! That’s very similar to how I fixed it locally. The only real difference is that I used a waitable event so that it doesn’t have to do polling in the destructor.

The fact that the docs say “…the delegate will be retained…” seems to make it clear that there is no way to know that it has finished other than tracking it manually.

Here is the fix I’ve been testing locally.

In URLConnectionState::didComplete():

void didComplete (NSError* error)
{
    // Signal to ~URLConnectionState() that the object can be destructed
    ScopeGuard SignalCallbackCompleted{ [&]() { CallbackCompletionEvent.signal(); } };

    // ...
}

In URLConnectionState::~URLConnectionState():

~URLConnectionState() override
{
    // Wait for task to complete before destructing
    CallbackCompletionEvent.wait();

    // ...
}

In URLConnectionStateBase:

class URLConnectionStateBase : public Thread
{
    // ...

    // Ensures that OS callbacks are complete before destructing
    WaitableEvent CallbackCompletionEvent;

    // ...
}
1 Like

Dear @reuk, is there any chance this can be fixed in the next version? This isn’t very pleasant because with the new VT3 manifest the plugin is just loaded and closed and if it fails it breaks the entire build system…

There is another discussion started in the 2020 Crash in URLConnectionState::run() while app is shutting down.

PluginProcessor.cpp (6.6 KB)
vst3_manifest_tester.sh.txt (588 Bytes)

Thanks for reporting this issue, and for your patience. We’ve been very busy with the JUCE 8 release, so we’re only just getting round to looking at some longer-standing bug reports.

We’ve made a series of updates to the URLConnectionState class, which should improve stability on macOS.

The first change removes code that will never be called on any of JUCE’s supported platforms:

Secondly, we’ve improved thread safety in the NSURLSession-based downloader:

Finally, we’ve refactored to allow multiple connections to re-use the same NSURLSession instance:

The new threading model is a bit simpler, so there are fewer places for threading bugs to lurk. I’ve tested this out a bit, but I haven’t necessarily uncovered all of the potential issues, so I’d recommend trying it out yourself and letting us know if you still encounter problems.

Thank you @reuk. I tested the new code and I confirm you the crash is fixed. Anyway, it seems the completion/destructor of the URLConnectionState is not correctly managed.

If you try this simple example (taken from the example NetworkingDemo.h)

int statusCode = 0;
auto vURL = juce::URL("https://www.google.com");
auto vpStream = vURL.createInputStream(juce::URL::InputStreamOptions(juce::URL::ParameterHandling::inAddress)
    .withStatusCode(&statusCode));
if (vpStream)
{
    std::cout << "STREAM EXISTS: " << statusCode << " " << vpStream->isExhausted() << std::endl;
}

In the output, you will note

STREAM EXISTS: 200 0

Error Domain=NSURLErrorDomain Code=-999 "cancelled" UserInfo={NSErrorFailingURLStringKey=https://consent.google.com/ml?continue=https://www.google.com/&gl=IT&m=0&pc=shp&uxe=none&cm=2&hl=it&src=1, NSErrorFailingURLKey=https://consent.google.com/ml?continue=https://www.google.com/&gl=IT&m=0&pc=shp&uxe=none&cm=2&hl=it&src=1, _NSURLErrorRelatedURLSessionTaskErrorKey=(
    "LocalDataTask <E3C45ACD-1E53-4589-BFCA-FA006E343E7C>.<1>"
), _NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask <E3C45ACD-1E53-4589-BFCA-FA006E343E7C>.<1>, NSLocalizedDescription=cancelled}

vpStream goes out of scope before receiving the didComplete callback and the task is canceled by the ~URLConnectionState().

Reading the comment of the juce::URL::createInputStream() I found:

Note that this method will block until the first byte of data has been received or an
error has occurred.

So I tried to wait until the isExhausted() value is true, but it is always false.

while (!vpStream->isExhausted())
{
     juce::Time::waitForMillisecondCounter(10);
}

For this reason, I suppose the createInputStream() should be blocking and wait for the completion callback and delete the task only if is still pending in the destructor or if is not ensured the stream is ended the isExhausted() is not correctly updated.

I just tried this out myself, and as far as I can tell it works as intended.

This is correct. createInputStream only blocks until the first byte of data is received, but the rest of the data may take longer to arrive. When the stream goes out of scope, it cancels any ongoing request so that destruction isn’t blocked. Otherwise, a large request could prevent an app from shutting down for a long time, especially on a slow connection.

isExhausted returns true once there is no further data to read from the stream. That is, if you call read(), readByte(), or readEntireStreamAsString() on the stream, isExhausted will return true once there is no further data to read. If you don’t read from the stream, but the stream still contains data that hasn’t been read, then isExhausted will never return true.

In the following snippet, isExhausted() returns true after reading the entire stream, as expected.

const juce::URL::InputStreamOptions opt (juce::URL::ParameterHandling::inAddress);

if (auto stream = juce::URL ("https://www.google.com").createInputStream (opt))
{
    stream->readEntireStreamAsString();
    jassert (stream->isExhausted());
}

Ok, I’ll try! I’m a little confused about how the readEntireStreamAsString, but I’ll investigate.
Essentially you should loop until stream->isExhausted() becomes true to be sure you have read all the stream…

I ran into a bug recently where the ListenerList wasn’t working in Release… I changed the start and end initializers and set them to zero

Rail