Massive CPU increase on latest version of JUCE

You might end up with better performance than you had before! :grin:

1 Like

What do you recommend when I have an assignment such as *gain = 1.0f; ?

Should I just dereference as usual here?

You have to use the deref, or store() function to save the value in the atomic. The gain you are looking for is done by reducing the number of times you HAVE to access the atomic. and as is described here by others, if you don’t expect a value to change, derefing it once, and caching it in a local variable, is where that gain comes from.

Is there any benefit of using store() rather than standard deref?

I always use load and store as a easy way to remember that I am working with an atomic. afaik, the deref will call load or store in the end…

2 Likes

The thing is not the deref itself. std::atomic has operator= and operator T (conversion operator) as shorthands for store and load:

std::atomic<float> a = 1.0f;

float b = a.load();
float c = float (a); // operator float calls load()
float d = a; // implicit operator float

a.store (b);
a = b; // operator= calls store()

The operators call load and store with the default memory order (sequentially consistent), so if you need a different one, you have to call load and store explicitly.

1 Like

Thanks for the better clarification! I realized I was only paying 1/2 attention. :slight_smile:

1 Like

If gain is the std::atomic<float>* from getRawParameterValue(), then the answer is you shouldn’t write to that variable at all. Instead use the parameter and call setValue or setValueNotifyingHost instead, whichever is applicable.

The system is not designed to read back the atomic, so your changes will have no effect.

Good point Daniel, I don’t actually have any instances of that, I was just wondering for future.

Good news guys, after making the change I’ve gained 1% CPU performance compared to the old JUCE version without atomics, and that’s just after making changes to the processor only. I’ll do the same in the editor to see if I can make further gains.

3 Likes

there’s probably some room for improvement in the docs around this, specifically here

Returns a pointer to a floating point representation of a particular parameter which a realtime process can read to find out its current value.

maybe the juce folks can add a recommendation / best practice here?

I encapsulated a std::atomic inside a class that by default only uses memory_order_relaxed (I can store with operator=, but only read with a load() function). Then I have another class which contains another atomic with act as memory barrier to synchronize these relaxed atomics, with acquire / release pairs.It’s only in some weird cases where you need to use cst memory order.

1 Like

I think relaxed memory order are perfect for audio thread! You only need to do a single release at the end of the processBlock() and then having a timer with a exchange/acquire to refresh your UI, and the UI can read the lastest values from the audio thread directly (since they are atomic) without any issues

1 Like

Well it obviously depends on what your plugin is doing and what threads you have etc.
My reasoning is that in the audio thread you probably want read/writes to be as close to the code you write as possible so don’t really want them reordered. For example, if you have another thread that sets some of the atomics, you want all of those values to be visible in your next process block.

Using acquire/release achieves this. If you use relaxed ordering for reading on the UI thread, the worst that can happen is you get the same value twice when it’s really been updated by another thread. This wouldn’t normally hurt for things like meters etc. which have a relatively low resolution.

Indeed, it depends on plugins. But with Fifos and acquire / release semantics I’ve been to solve the 90% of the problems so far.

I find this is working for send variable changes from message to audio thread too: When I want to send a variable change (or a set of them) I usually do it using bit-flags, an atomic integer where I just fill the bits and I read & clear just once before the audio thread starts. With more complex things, I use a FIFO, which is executed after the flags.

Yes Dave! And we’ve all got quite used to the way Intel code is generated for this stuff. I’m starting to worry about how this may (or may not) but different (even subtly) on M1/ARM64 Macs. Or am I paranoid!? :rofl:

There is a subtle difference on ARM but I can never remember what the difference is and whether its more or less strict. I therefore conclude the safest approach is to use atomics with the correct intent and everything should (:crossed_fingers:) work.

(At least we don’t have to deal with an endianness change!)

1 Like

:rofl:

Must be a programming joke I unfortunately don’t get :frowning:

If I recall correctly, Intel coalesces all relaxed atomic stores, so an acquire always would see the lastests values UNTIL the last release. But with ARM you may acquire a newer value which has been stored before the release has been executed (this is how it should work when you read the c++ specs). It shouldn’t be a problem with simple flags as I proposed

1 Like