Destroying playing Edit sometimes causes process to hang via PlayHead::syncPositions member

I’ve been noticing this behavior for a bit between TE projects, but I finally tried to understand it a bit more. I have a demo project that can faithfully reproduce the hung lock that I’m seeing.

Stuck accessing PlayHead::syncPositions
Sometimes when destroying a previously loaded Edit, a hang around access to PlayHead::syncPositions via PlayHead::getSyncPositions() occurs. And the first and only thing I’ve discovered so far that seems to remove the chance for the process to sometimes hang during Edit destruciton, is to not call getTransport().play() at all on the Edit before destroying it (which of course is not practical, but lends a clue to the issue perhaps).

If the transport has invoked play() at all, regardless of whether or not the transport is currently playing, there seems to be a chance the hang will occur when the Edit is destroyed. Once it has occurred, there is no recovering, and the app now requires a force quit.

The callstack has shown both what looks like a general position sync of the playhead happening during the hang via audio device callback, and something happening with the retrospective buffers can hang as well - either way, it’s always a call to PlayHead::getSyncPositions() that infinitely locks. On my system it looked like a spin lock that never recovered.

I’m on Windows 10. This is a somewhat recent version of the develop branch of TE (although I’ve had it on all develop commits I’ve tried so far).

Is there anything extra I need to be doing in terms of properly shutting down/cleaning up an Edit or related Transport chores before deleting an Edit? The only thing I could find on the forum was this post, but it seemed to indicate there really wasn’t much involved in deleting it, and by design all of the necessary magic occurs in the destructor I am guessing?

By the looks of it, there are things still in place (like an audio device callback) continuing to attempt some edit playhead position synchronization that ideally would be disconnected and cleaned up before continuing to destroy the Edit. I don’t have a great grasp yet, but so far it seems to be atomic access behavior related to continued access through callbacks combined with what access occurs during destruction of an Edit.

This means that periodically when releasing a previously opened and played edit, the whole process will hang and require a force quit. And although this doesn’t happen every time, and there are moments of normal use, it happens with enough frequency to make consistent app hangs just part of the reality of use, which is super sad.

Any advice here would be greatly appreciated!

Update: This definitely seems due to the RetrospectiveRecordBuffer calling syncToEdit()

Here’s what CPU usage looks like when this happens:

Here in the atomics is specifically where it’s stuck:

I created a PIP that will setup a minimal demo App that demonstrates this behavior for me on Windows 10, untested on other platforms so far.

Here is the PIP:
RetrospectiveRecordBufferSyncHang.h (5.3 KB)

And here is the .tracktionedit I was using as a super simple example to test loading/unloading/reloading with (rename from .txt to .tracktionedit, had to use .txt to make forum attachments happy):
TestEdit.txt (2.3 KB)

Set to C++17 before trying to build.

Essentially, just compile the app and run it, there is a text entry field where you should paste an absolute path to a .tracktionedit file that it will continuously load, play, and delete, and repeat, on a timer. Once you’ve pasted the path to the .tracktion edit, press the “Start Reloading” button. You probably won’t need to stop, because the point is to get it to hang, but the button is there if you need to.

There is a small text label readout that updates to the latest reload count since the start button was pressed. This is also an obvious indicator it has hung, because it will stop updating.

The reload hang can happen as soon as maybe 4 reloads, and I rarely get above 30-40 reloads before it happens for 100%.

If I comment out the line that specifically calls play on the Edit’s transport after it is loaded, the issue disappears (but of course I want to actually play Edits).

// Comment out the line below and the issue never happens

I’m not at my computer at the moment but before you delete the edit, if you call edit->transportControl. freePlaybackContext() does the problem go away?

I thought there was an assertion in the edit destructor to catch this but I may be wrong…

Unfortunately, it does not. While researching the problem and possible solutions I ran across freePlaybackContext() and was hopeful it’s what was needed, but it doesn’t have the desired effect.

I also tried to get access to the underlying WaveInputDevice RetrospectiveRecordBuffer by using getRetrospectiveRecordBuffer(), which I can do, but the compiler complains of it being an incomplete type, so I can’t call removeEditSync() manually this way. Regardless, it appears RetrospectiveRecordBuffer::removeEditSync() is being called in the destructor of WaveInputDeviceInstance

What I can tell you is that putting a break point in the destructor of WaveInputDeviceInstance where it is removing the sync of the RetrospectiveRecordBuffer shows that it does not get called before the spin lock hangs, so whatever the timing of the issue is, it occurs before ~WaveInputDeviceInstance().

Update: I found a workaround, see below…

Also, I was looking at the call stack and this bit of code is part of the issue:

void WaveInputDevice::consumeNextAudioBlock (const float** allChannels, int numChannels, int numSamples, double streamTime)
    if (enabled)
        bool isFirst = true;
        const juce::ScopedLock sl (instanceLock);

        // do this first, as writing to file trashes the buffer
        for (auto i : instances)
            i->acceptInputBuffer (allChannels, numChannels, numSamples, streamTime,
                                  isFirst ? &levelMeasurer : nullptr,
                                  (! retrospectiveRecordLock) ? retrospectiveBuffer.get() : nullptr, isFirst);
            isFirst = false;

When acceptInputBuffer() is called, retrospectiveRecordLock is false, and so the pointer to retrospectiveBuffer is passed in.

Inside of acceptInputBuffer() we see this bit of code:

if (retrospectiveBuffer != nullptr)
    if (addToRetrospective)
        retrospectiveBuffer->updateSizeIfNeeded (inputBuffer.getNumChannels(),
        retrospectiveBuffer->processBuffer (streamTime, inputBuffer, numSamples);

    retrospectiveBuffer->syncToEdit (edit, context, streamTime, numSamples);

The syncToEdit() call on the last line of the above snippet is what is on the call stack when access to RetrospectiveRecordBuffer::getSyncPositions() is called and hangs.

I’m not exactly sure yet what retrospectiveRecordLock does, but it seems like if it got set to true during the cascade of events from the Edit destructor, then WaveInputDevice::consumeNextAudioBlock() would pass nullptr instead for the retrospective buffer pointer, and the call to RetrospectiveRecordBuffer::getSyncPositions() would be avoided.

I’m not sure if that’s viable, or what other externalities exist that could be affected, but from a high-level view, it looks like it would avoid the offending call.

So I found a potential work around that seems to solve the issue based on the findings of my last post. What I found is a method that lets me manually set the retrospectiveRecordLock on te::InputDevice, which is the static method InputDevice::setRetrospectiveLock().

What I do is now is invoke this and set the lock to true before I destroy the edit:

if (edit != nullptr)
    te::InputDevice::setRetrospectiveLock(engine, edit->getTransport().getCurrentPlaybackContext()->getAllInputs(), true);

And then after loading the new edit, I set the lock back to false:

edit = std::move(te::loadEditFromFile(engine, editFile));
te::InputDevice::setRetrospectiveLock(engine, edit->getTransport().getCurrentPlaybackContext()->getAllInputs(), false);

The above approach seems to solve the issue from my initial tests!

I’m not sure if restoring the lock to false after loading a new edit (and presumably setting up a new context) is needed, but figured I would do it as a precaution.

Hmm, well… when I applied this workaround to my actual app, and not just the demo PIP above, it does indeed seem to solve the RetrospectiveRecordBuffer issue, however, there is an additional call to PlayHead::getSyncPositions that still causes a hang. It is from EditPlaybackContext::globalStreamTimeToEditTime() which in turn calls PlayHead::referenceSamplePositionToTimelinePosition() which then calls PlayHead::getSyncPositions() where the hang can still occur.

By looking at the code:

double EditPlaybackContext::globalStreamTimeToEditTime (double globalStreamTime) const
    if (! nodePlaybackContext)
        return 0.0;
    const auto sampleRate = getSampleRate();
    const auto globalSamplePos = tracktion_graph::timeToSample (globalStreamTime, sampleRate);
    const auto timelinePosition = nodePlaybackContext->playHead.referenceSamplePositionToTimelinePosition (globalSamplePos);
    return tracktion_graph::sampleToTime (timelinePosition, sampleRate);

It looks like if nodePlaybackContext is set to nullptr then this bit should just return, and never hit the offending line

const auto timelinePosition = nodePlaybackContext->playHead.referenceSamplePositionToTimelinePosition (globalSamplePos);

Since nodePlaybackContext is just a std::unique_ptr it looks like it is only deleted when the entire EditPlaybackContext goes out of scope, yet the above method still can get called from an audio device callback call tree before that happens by the looks of it.

Since there wasn’t any method I could find that actually deletes nodePlaybackContext, I changed ~EditPlaybackContext() to look like this:

    nodePlaybackContext.reset(); // <- I added this line

    edit.engine.getDeviceManager().removeContext (this);

    midiDispatcher.setMidiDeviceList (juce::OwnedArray<MidiOutputDeviceInstance>());

So I manually reset the nodePlaybackContext pointer at the very beginning of the destructor. Now, I have no idea what other side effects this may cause, but I just wanted to try for testing. Regardless, doing this, and adding a call to TransportControl::freePlaybackContext() after calling InputDevice::setRetrospectiveLock() (and setting the lock to true), but before deleting the actual edit seems to mitigate this second hang, which was unrelated to the RetrospectiveRecordBuffer.

I know you said you tried calling freePlaybackContext(), but the things you described afterwards don’t sound like what would happen if you did.

That function should delete any WaveInputDevice instances which use the edit. So if you’re seeing it going into WaveInputDevice::consumeNextAudioBlock() for your deleted edit later, then either you didn’t call freePlaybackContext(), or you accidentally re-attached a playback context afterwards? Or maybe you were calling freePlaybackContext on a different thread and have a race or something?

When Dave said he thought there was an assertion to catch this, he probably means the one here:

    jassert (Selectable::isSelectableValid (&edit));

which checks that input devices instances get deleted before the edits that they’re using. But I guess that if you leak the input device then it won’t get a chance to check this.

Looking for bugs in retrospective recording is a red-herring - as long as you make sure the WaveInputDevice has been correctly deleted, then none of the retrospective record code will be running any more, so wouldn’t bother spending time on it.

Try breakpointing the Edit destructor, and the constructors and destructors for WaveInputDevice, and just watch the order in which they get created and torn down - that should give you some clues.

Thanks for chiming in! I agree with you that the retrospective record buffer is a red herring and that it won’t be an issue if WaveInputDevices instances get cleaned up properly. After investigating it, that became clear in the process.

The question remains though, why is WaveInputDevice sometimes not being cleaned up properly where an audio device callback is invoking methods that ultimately try to synchronize the playhead after an Edit’s destructor is called?

The PIP example above that can reproduce the issue is very simple. It is just a JUCE application, that uses the Tracktion Engine module.

MainComponent defines the Engine member, and a std::unique_ptr<te::Edit> to hold the loaded/reloaded Edit. There are some extra conveniences but, that’s pretty much the core of it.

te::Engine engine{ ProjectInfo::projectName, nullptr, nullptr };
std::unique_ptr<te::Edit> edit = nullptr;

Originally, I created the test with only a button in the UI and nothing more. The button click handler would call a load/reload method that did this:

if (edit != nullptr) edit.reset(); // delete the current edit if there is one
auto editFile = File("path to edit file hardcoded here");
edit = te::loadEditFromFile(engine, editFile);

And that’s it! If you clicked the button many times, eventually you will run into the issue where the WaveInputDevice instance is not cleaned up before the audio device callback eventually invokes a method that tries to access PlayHead::getSyncPositions(). Adding freePlaybackContext() before deleting the Edit does not fix the issue. And I thought maybe it was just because if I spam the reload method too fast there are issues, but that’s not the case, it’s really just a random timing thing.

I’m not sure how I can make the example much more simple than that? The PIP I posted has some added extras to make it more convenient to test, but it started as simple as what I just outlined above.

As far as I can tell, this is the simplest thing you can do for setting up a JUCE app that unloads/loads an Edit on the message thread.

At first, I was seeing this in a more complicated app where it was hard to tell if there was some other issue I had introduced unknowingly. But it was disruptive to the use of the app, so I decided to investigate and see how simple of a project could I create that would reproduce the issue, which is what I’ve just outlined above.

I would be inclined to agree with you that it was something else or something I’ve introduced if I couldn’t reproduce it with such a simple example, but unfortunately that doesn’t seem to be the case here.

When I get a chance I will test on macOS and see if it reproduces there as well, or just seems to be a Windows thing.

Edits rely on some message-thread callbacks to do their work. If you’re creating and deleting Edit objects without letting the message thread run in between then maybe some kind of initialisation messages that set up the audio devices could get delivered after the edit has already been deleted? Dave might have more of a clue about whether that’s a realistic theory.

1 Like

Ok, maybe I will set something up that defers loading the new edit until the message thread has some time to run after deleting the current edit, and see if allowing a bit of a time buffer of message thread execution avoids the issue. Thanks for the suggestion.

If that does turn out to be the cause, it’s still probably something we should fix in the engine itself so that you’re not required to know about it or work around in client code. Keep us posted with any more clues!

That would be one indication, the one I was referring to was this in the Edit destructor though:

    jassert (! getTransport().isPlayContextActive());

A couple of lines above that there’s this:

    if (transportControl != nullptr)

Which should clear all the input device instances for that Edit.

Like @jules said, there’s a chance you’re re-attaching the Edit to the DeviceManager (i.e. creating a new EditPlaybackContext) during the Edit’s destruction?

I just tried your example on macOS and stopped it after 500 reloads with no hang. So I’m not sure if this is Windows specific, I can’t see why it would be though. As far as I could tell, there was no re-attaching to the device manager during destruction.

I did find one possible race condition though which might result in some kind of corruption.

    releaseDeviceList();                                  //<<< Add this line
    edit.engine.getDeviceManager().removeContext (this);

Can you add the above releaseDeviceList line to the EditPlaybackContext destructor to see if that helps?