APVTS Updates & Thread/Realtime Safety

I’ve got a few questions on this topic so I numbered them. Very curious how others have dealt with these issues.

  1. In my various freelance work I’ve inherited projects more than once where AudioProcessorValueTreeState::Listener::parameterChanged is overridden in a way where the previous dev(s) made calls to gui related code (e.g. Label::setText). This isn’t safe since parameterChanged is often called on the audio thread for automation purposes (which can also be tested with PluginVal). Perhaps we can make it more obvious that it isn’t safe? Maybe put something in the docs to dissuade this usage?

  2. To safely connect gui to the parameter system an obvious solution is to use one of the attachments provided. However when I look into the code to see how it’s implemented it’s using an AsyncUpdater which allocates when triggered and is thus not realtime safe:

    void AttachedControlBase::parameterChanged (const String&, float newValue) override
    {
        lastValue = newValue;

        if (MessageManager::getInstance()->isThisTheMessageThread())
        {
            cancelPendingUpdate();
            setValue (newValue);
        }
        else
        {
            triggerAsyncUpdate();
        }
    }

Is there something I’m missing here?

  1. To me, the next obvious way to connect gui stuff to the parameters without using the attachments is to listen to the APVTS’s ValueTree. Looking under the hood this uses atomic flags and a main thread timer to poll for updates, so it looks like the thread and realtime safe solution I want:
void AudioProcessorValueTreeState::timerCallback()
{
    auto anythingUpdated = flushParameterValuesToValueTree();

    startTimer (anythingUpdated ? 1000 / 50
                                : jlimit (50, 500, getTimerInterval() + 20));
}

However I’d like the option for a parameter to flush its value to the state tree in sync in the event that the parameter is changed on the main thread. For example, if I click an “on” button that enables some features in the plugin, I want those gui widgets on the plugin to respond instantly, instead of waiting 500mS which can make it feel quite laggy.

4 Likes

AsyncUpdater only allocates when it’s constructed:

AsyncUpdater::AsyncUpdater()
{
    activeMessage = *new AsyncUpdaterMessage (*this);
}

triggering it doesn’t allocate, as the message is reference counted:

void AsyncUpdater::triggerAsyncUpdate()
{
    // If you're calling this before (or after) the MessageManager is
    // running, then you're not going to get any callbacks!
    JUCE_ASSERT_MESSAGE_MANAGER_EXISTS

    if (activeMessage->shouldDeliver.compareAndSetBool (1, 0))
        if (! activeMessage->post())
            cancelPendingUpdate(); // if the message queue fails, this avoids getting
                                   // trapped waiting for the message to arrive
}
bool MessageManager::MessageBase::post()
{
    auto* mm = MessageManager::instance;

    if (mm == nullptr || mm->quitMessagePosted.get() != 0 || ! postMessageToSystemQueue (this))
    {
        Ptr deleter (this); // (this will delete messages that were just created with a 0 ref count)
        return false;
    }

    return true;
}
bool MessageManager::postMessageToSystemQueue (MessageBase* message)
{
    jassert (appDelegate != nil);
    appDelegate->messageQueue.post (message);
    return true;
}
// juce_osx_MessageQueue
    void post (MessageManager::MessageBase* const message)
    {
        messages.add (message);
        wakeUp();
    }
private:
    ReferenceCountedArray<MessageManager::MessageBase, CriticalSection> messages;
    CFRunLoopRef runLoop;
    CFRunLoopSourceRef runLoopSource;
    ObjectClass* add (ObjectClass* newObject)
    {
        const ScopedLockType lock (getLock());
        values.add (newObject);

        if (newObject != nullptr)
            newObject->incReferenceCount();

        return newObject;
    }

So, while it doesn’t allocate, it does lock.
And who knows what happens inside runLoop and runLoopSource on OS X…

1 Like

ReferenceCountedArray::add has the potential to allocate. And yeah, not to mention the lock.

1 Like

It’s not that hard to write a non-blocking async-updater (it’s basically an atomic flag with a timer polling it, or you could have a thread polling it and then calling a juce::AsyncUpdater), it’s more about what sacrifices you make when building that class.

Real-time is all about trade-offs and for a library these can be difficult to get right. How much do you trade off CPU against update latency? (I.e. the frequency of polling). How does that impact timer performance in the rest of the app.
Do you use a shared thread to poll all the flags etc.

I’m still relatively skeptical that in a standard plugin environment there would be many noticeable dropouts due to using AsyncUpdater but I’m more than interested in some evidence to show there is. (In a restricted embedded environment this is much more likely to be problematic).
The reason I say this is that unless the message loop is being hammered by your app (which might indicate there are problems elsewhere) the queue isn’t likely to allocate add the lock is likely to be mostly uncontended making it (probably) very fast and in user-space (not involving a system call).

Of course, all these things vary and a lock-free approach would be best. I just think it’s worth discussing the design decisions that need to be made first.

5 Likes

On windows PostMessage incurs a system call anyway.

Measuring the Timer postMessage call on Windows 10 (which is a higher priority thread):

Performance count for "postMessage" over 1000 run(s)
Average = 53 microsecs, minimum = 10 microsecs, maximum = 20 millisecs, total = 53 millisecs
The thread 0x6b84 has exited with code 0 (0x0).
Performance count for "postMessage" over 1000 run(s)
Average = 32 microsecs, minimum = 13 microsecs, maximum = 313 microsecs, total = 32 millisecs
Performance count for "postMessage" over 1000 run(s)
Average = 32 microsecs, minimum = 13 microsecs, maximum = 247 microsecs, total = 32 millisecs
Performance count for "postMessage" over 1000 run(s)
Average = 44 microsecs, minimum = 9 microsecs, maximum = 12 millisecs, total = 44 millisecs
The thread 0x6894 has exited with code 0 (0x0).
Performance count for "postMessage" over 1000 run(s)
Average = 81 microsecs, minimum = 13 microsecs, maximum = 47 millisecs, total = 81 millisecs
Performance count for "postMessage" over 1000 run(s)
Average = 27 microsecs, minimum = 13 microsecs, maximum = 273 microsecs, total = 27 millisecs
Performance count for "postMessage" over 1000 run(s)
Average = 31 microsecs, minimum = 12 microsecs, maximum = 242 microsecs, total = 31 millisec

So generally very fast but with a few long delays. Any improvements to the methodology for testing this welcome! I did this in juce_Timer.cpp just because it was a handy place that calls postMessage a lot:

                {
                   static PerformanceCounter counter{ "postMessage", 1000 };
                   counter.start();
                   messageToSend->post();
                   counter.stop();
                }

I found Juce code using AsyncUpdater on the audio thread is troublesome on windows. Updating an AudioProcessorGraph was the problem in my case. I don’t use APVTS, but I think the same problems could happen.
Things became troublesome during faster-than-realtime bounces in some hosts. Depending on how many plugins want to use the Message Queue at the same time, things can really go awry and bouncing is probably the worst moment for new issues to happen. Timing is unreliable in general, but offline bounces multiply the issues. I ended up refactoring the JUCE code, so AudioProcessorGraph wouldn’t use AsyncUpdater anymore and problems were solved.

1 Like

I took a stab at re-implementing AsyncUpdater using a signal and a thread instead of a message, seems pretty fast on macOS:

Performance count for "signal" over 1000 run(s)
Average = 0 microsecs, minimum = 0 microsecs, maximum = 0 microsecs, total = 104 microsecs
Performance count for "signal" over 1000 run(s)
Average = 0 microsecs, minimum = 0 microsecs, maximum = 1 microsecs, total = 104 microsecs
Performance count for "signal" over 1000 run(s)
Average = 0 microsecs, minimum = 0 microsecs, maximum = 0 microsecs, total = 99 microsecs
Performance count for "signal" over 1000 run(s)
Average = 0 microsecs, minimum = 0 microsecs, maximum = 26 microsecs, total = 136 microsecs
Performance count for "signal" over 1000 run(s)
Average = 0 microsecs, minimum = 0 microsecs, maximum = 0 microsecs, total = 109 microsecs
Performance count for "signal" over 1000 run(s)
Average = 0 microsecs, minimum = 0 microsecs, maximum = 0 microsecs, total = 99 microsecs

I’ll test on Windows tomorrow. Code is here if you have any comments: https://github.com/FigBug/Gin/blob/master/modules/gin/utilities/realtimeasyncupdater.cpp

2 Likes

I did some similar tests myself yesterday and came up with a similar approach and a couple of others. This leads to some discussion.

Firstly, using a condition variable isn’t technically real-time safe. notify_one/all results in a system call which is a potentially blocking operation. This really begs the question whether people have strict real-time requirements for a library based RealTimeAsyncUpdater or they’d be happy with a condition variable approach.

Secondly, this does spin up a thread and maybe that’s ok, maybe it isn’t. If every plugin starts a thread for updates, in a DAW situation this could easily lead to hundreds of threads. Maybe this is acceptable though…

Finally, with this approach there is a subtle behavioural difference in that the handleAsyncUpdate callbacks will always happen in the order in which the RealTimeAsyncUpdaters are created. Again, for parameter updates this is probably ok but worth considering as it is a deviation from current behaviour.


I did have two other approaches that run either on a juce::Timer or background thread (via juce::HiResolutionTimer which have the benefit of being wait-free and so real-time safe.
However, because they poll you have to decide statically what update frequency to use and there will be wasted CPU polling when there has been no update.
With these approaches you are also subject to the latency of the timer. This gives an average latency of the timer period with a lot of jitter around it depending on when the update is signalled in relation to the timer period.

With the Timer approach I also wonder what effect this might have on other timers running in your app as it will always be part of the timer callback queue and might lead to more shuffling around.

1 Like

I need to update my performance numbers, they aren’t as good as I thought. I was including signalling the event when it’s already signalled. That’s pointless and really fast. So here are the numbers when actually waking up the thread every time.

macOS:

Performance count for "signal" over 1000 run(s)
Average = 3 microsecs, minimum = 1 microsecs, maximum = 43 microsecs, total = 2968 microsecs
Performance count for "signal" over 1000 run(s)
Average = 4 microsecs, minimum = 1 microsecs, maximum = 98 microsecs, total = 3829 microsecs
Performance count for "signal" over 1000 run(s)
Average = 4 microsecs, minimum = 1 microsecs, maximum = 73 microsecs, total = 3756 microsecs
Performance count for "signal" over 1000 run(s)
Average = 4 microsecs, minimum = 1 microsecs, maximum = 162 microsecs, total = 3782 microsecs
Performance count for "signal" over 1000 run(s)
Average = 5 microsecs, minimum = 0 microsecs, maximum = 161 microsecs, total = 5029 microsecs
Performance count for "signal" over 1000 run(s)
Average = 5 microsecs, minimum = 1 microsecs, maximum = 45 microsecs, total = 4515 microsecs
Performance count for "signal" over 1000 run(s)
Average = 4 microsecs, minimum = 1 microsecs, maximum = 43 microsecs, total = 3825 microsecs

Windows:

Performance count for "signal" over 1000 run(s)
Average = 1 microsecs, minimum = 1 microsecs, maximum = 9 microsecs, total = 1154 microsecs
Performance count for "signal" over 1000 run(s)
Average = 1 microsecs, minimum = 1 microsecs, maximum = 12 microsecs, total = 1167 microsecs
Performance count for "signal" over 1000 run(s)
Average = 1 microsecs, minimum = 1 microsecs, maximum = 4 microsecs, total = 1070 microsecs
Performance count for "signal" over 1000 run(s)
Average = 2 microsecs, minimum = 0 microsecs, maximum = 842 microsecs, total = 1981 microsecs
Performance count for "signal" over 1000 run(s)
Average = 1 microsecs, minimum = 1 microsecs, maximum = 11 microsecs, total = 1018 microsecs
Performance count for "signal" over 1000 run(s)
Average = 1 microsecs, minimum = 1 microsecs, maximum = 3 microsecs, total = 938 microsecs
Performance count for "signal" over 1000 run(s)
Average = 1 microsecs, minimum = 1 microsecs, maximum = 13 microsecs, total = 907 microsecs
Performance count for "signal" over 1000 run(s)
Average = 1 microsecs, minimum = 1 microsecs, maximum = 49 microsecs, total = 1146 microsecs

I also updated it so it calls the callbacks in the order they were triggered.