Sending signal/events from audio to GUI thread?

As I understood, the point of the reset (the exchange with 0) was taking the max between two consecutive calls of the UI Timer -it wasn’t triggered by anything else. I just meant that with two atomics you could lose not only granularity, but actual peak values. The atomic is still set once per block.

That’s true, but in context - a peak meter would very likely run much much faster than your eye can see or your mouse can click, so even if you do the exchange in a granular way on the audio thread, you’re gonna miss out on quite a lot of peak values even if you manage to click in the same visual frame where your display finally caught and showed you that peak. so in the real world your reset atomic would probably be set at least a block or more later.

Thanks for chiming in, @kamedin, and yes you understand the idea that I was describing. And yes, missing actual peak values seems like a concern with the “two atomics” system. The use of the -1 value as a flag is clever, but I’m not sure if that winds up possibly missing peaks as well… I need to work that out and see.

@eyalamir I think you’re talking about solving a different problem. There are no mouse clicks to handle in the simple peak light code I posted. (I’m guessing you were imagining a peak hold light, that stays on until reset by a mouse click?) The peak light I was showing goes on if there was any sample over -12dB since the last Editor timerCallback, and turns off if there were no samples over -12dB in that same period.

Because the Editor’s Timer drives the reset, it never does miss a peak (though, given the acknowledged amount of jitter in the actual timing of the JUCE Timer class, the length of the “peak light on” time for a single frame will vary a bit).

Perhaps I’m missing the actual use case of what you’re trying to do, polling atomics inside accurate sample loops from the message thread doesn’t seem practical.

Instead, if you need time based peak polling, I would just have the audio thread measure the peak every X samples/block in a very accurate way, and then just post its latest peak status via a single atomic.

The timer can then keep track of said atomic and turn the light on/off depends on its state without sending any reset message (the processor would reset its own state every X samples, and the light would follow whenever it can).

OK, to put some concrete numbers on it, I’m talking about an Editor with a Timer that polls at 30 Hz. Are you saying that hitting the Atomic exchange method 30x a second would create too much of a performance hit?

The use case here is a signal peak indicator, but clearly the GUI element could be a more complex one. The same peak level value could be used to drive a multi-segment LED-style peak meter. In the general sense, the problem it’s solving is “how do I update the GUI based on the audio signal”. Or, conceptually, “how do I downsample 44.1 kHz to 30 Hz”?

But if the processor gets to reset the peak, without knowing if the Editor picked up on them, then the LED might miss some peaks.

What they were trying to avoid was missing a peak because it was discarded between frames. For some uses, missing a peak is relevant, even if granularity is low. For example, if you’re showing a compressor’s gain, you may have a sharp cut on a transient that’s missed by the display because its block happened between frames. In this example, the simplest solution seems to be resetting from the UI, even without touching the atomic in the sample loop. In a different setup with a buffer you wouldn’t need that -your atomic would be the buffer index and you could compute the max on the UI side.

I was thinking in amp values which are always positive, of course with dB you’d need a different out-of-range value, maybe FLT_MAX.

No, the exchange itself isn’t the performance overhead at all.

The sample-loop in the processor is the overhead, and using an atomic in there is very very expensive. Just to give you some estimates, depends on what you’re doing but some early tests I’ve done show that call an atomic during the sample loop makes it about 10 times slower in most cases (debug and release builds) than calling it once per block.

If you need to downsample from 44.1 to 30hz all you need to do is run a simple sample loop that resets/stores the value every 1470 samples and only then store it in the atomic. Then read it in a 30hz timer. The reading won’t be perfectly in sync of course, but the audio would represent exact peaks in the musical interval of 30hz without missing a sample.

Yes, testing these implementations is important, and I still need to do my own testing with this.

But as a reminder, in the original code I posted, this is the atomic read/write code that would run on every sample:

if (audioLevelAbs > atomicPeak)
    atomicPeak = audioLevelAbs;

So the atomic is read on every sample, but it is most probably not written to on every sample. (Exactly how often it is written to would depend on the content of the audio signal, and obviously there’s no way to know that in advance.)

The speed of atomic reads apparently depends on particular CPU architectures. According to Herb Sutter in this video I just watched the other day, on an x86 it’s the same cost as a non-atomic read, “little to no overhead” in his words.

The cost is not in the atomic operation itself, but in the disabling of optimizations because of the memory barrier. I do my vectorization manually, and then the overhead of atomics is quite low, though I only use relaxed and release-acquire semantics.

1 Like

Kinda feeling like I’m repeating myself here. :slight_smile:

Reading the atomic itself, as a single operation isn’t what’s expensive, but an atomic in your loop means that the loop itself can’t be optimized, vectorized in any way.

To give you an example, I’m testing the following loops:

class LoopTester
{
public:
    void process(float* buffer, int numSamples)
    {
        for (int index = 0; index < numSamples; ++index)
            buffer[index] *= gain.load();
    }

    void processLoop(float* buffer, int numSamples)
    {
        float tempGain = gain.load();

        for (int index = 0; index < numSamples; ++index)
            buffer[index] *= tempGain;
    }

    std::atomic<float> gain {0.5f};
};

Running it several times in release builds (each test runs the function 10,000 times), the second function that doesn’t read the atomic every sample is 6-8 times faster.

In the Realtime 101 Dave and Fabian talk about how atomics and the optimizer interact with each other.

This example helped me understanding how atomic stops optimising (in Fabian’s example stopping the optimizer from going wrong)

(look at 30:30)

Maybe it helps you as well understanding what Eyal explained here

3 Likes

Thanks @daniel I had time to circle back around to this stuff this week and got to watch that video. Interesting point that Dave brings up a little later at 38:39, which is relevant to the peak value application that I was discussing above, is that you could possibly use a relaxed atomic for that to get a small performance improvement.

He describes it as OK to use relaxed atomic for “a small, independent piece of data on which no other data depends…something like a gain variable or a level meter”.

@eyalamir To follow up on your comment above, I do appreciate you repeating yourself here! I tried the same loop tests that you posted, and yes I see what you mean.

I would actually be more concerned to see the difference when there’s also a bunch of other DSP code happening in the loop, as would be the case in an actual plug-in. One particular operation running 6 times slower might not make a significant difference in the big picture, but if ALL the other DSP in that sample loop also gets slowed down, because it can’t be optimized, than that is a more serious concern.

It’s very possible that reading an atomic every sample wouldn’t be the worst thing in the world in terms of performance in your use case

But - I definitely think in the described case of measuring the audio level every X time, doing that in the audio thread would be the correct, accurate way, vs trying to achieve some granularity by updating it every sample and expect the polling in the GUI thread to represent anything of value.

I would agree that, if the goal was to accurately “measure the audio level every X time”, then using the Editor’s Timer wouldn’t be the way to do that. But that’s not the goal for a realtime peak level display. The only goal is to let the user know if a peak has occurred. So measuring that signal any faster than the speed at which the display will update is pointless.

To put it another way - the only “thing of value” that the polling in the GUI thread represents is that is precisely when the display will redraw. In essence, the Editor says “I’m about to redraw - has there been a peak since the last redraw?”

As for the performance question - I tried this out in a project, and there was a significant performance hit if I updated the peak level atomic on every sample. So, following the advice earlier in the thread, within the per-sample loop I update a local float value, and only at the end of the processBlock do I update the atomic with the local value. (Note that for both those updates, the stored value is only replaced if the new value is greater, since we’re looking for peaks.) Then, when the Editor’s Timer callback hits, each peak level atomic is read and reset to a value of 0.

Therefore, the performance advantage comes at the price of a slight inaccuracy - due to the fact that the Editor’s Timer callback only resets the atomic peak value, and not the local float peak value. The result of this is that a high peak value could be retained for an extra callback cycle. Not a huge deal, since LED peak metering often has “pulse-widening” added, so that the minimum “on” time is ensured to be long enough for the eye to register it.