General questions about thread safety

I’m a beginner and have watched quite a few posts and talks regarding thread safety, atomic and etc but still feel confused about this topic. Please bear with me if this is like a silly question. In Timur Doumler’s talk, there is an example:

struct Synthesiser
{
 std::atomic<float> level;
 // GUI thread:
 void levelChanged (float newValue)
 {
 level.store (newValue);
 }
 // real-time thread:
 void audioCallback (float* buffer, int numSamples) noexcept
 {
 const float currentLevel = level.load();
 for (int i = 0; i < numSamples; ++i)
 buffer[i] = currentLevel * getNextAudioSample();
 }
};

This use of atomic<float> makes sense to me. Then I was thinking about the case where there are multiple parameters, and instead of put them one by one in Synthesiser class directly, I want to wrap them in a struct. From the talk, std::atomic<Widget> parameter are likely to insert mutex inside and therefore not lock-free, but I’m not very sure about the understanding. I can think of two ways to write class Parameters:

In the first way, make the parameter fields float and the class object atomic; in the second way, make the parameter atomic and the class object normal one as below

struct Parameters1
{
    float Q {2};
    float frequency {100};
};

struct Parameters2
{
    std::atomic<float> Q {2};
    std::atomic<float> frequency {100};
};

class PluginProcessor
{
public:
    ......
 void setParams () // UI thread
 {
    Parameters1 param;
    param.Q = 3.f;
    param.frequency = 200.f;
    params1.store(param);
    
    params2.Q.store(3.f);
    params2.frequency.store(200.f);
 }
void getParams() // Audio thread
 {
    auto p1 = params1.load();
    auto Q1 = p1.Q;
    auto f1 = p1.frequency;
    
    auto Q2 = params2.Q.load();
    auto f2 = params2.frequency.load();
 }
    ......
    std::atomic<Parameters1> params1;
    Parameters2 params2;   
};

is any of these two certainly thread safe (no data race)? are they lock-free? I tried std::atomic<Parameters1>::is_always_lock_free and std::atomic<Parameters2>::is_always_lock_free and both return a true. Can I make the statement that if all the fields of a struct is lock-free, then the struct itself is also lock-free? Or it just happen to be true only in my particular platform/hardware?

my following question is, even if the parameter struct itself is not lock free, will Parameters2 still be acceptable in my case, since it’s not changed as a whole in setParams() (I know if I write something like params2 = someNewParams, probably the object is left in some kind of in-between state). But here I only seem to care about its members, which are both atomic, so I guess it will be fine? I feel this is kind of like the way using juce::AudioProcessorValueTreeState where itself is not atomic or lock-free but getRawParameterValue(StringRef) return an atomic object.

So, step by step:

  • Both atomics are most likely indeed lock free because of your CPU architecture. One cache line is most likely 16 bytes big, so you should be able to fit four floats into on struct and make it atomic without making the compiler to include mutexes.

  • In this case, Parameters1 is definitely the better choice. Not only does it make more sense with your processor data (as you noted yourself you might be running into race conditions if you expect the hole set to be updated at once, but it is probably also faster. The compiler has to (in theory) do a bunch extra stuff when an atomic gets updated. Wrapping something in atomics brings additional runtime costs. Additionally, if two threads update the two neighbouring atomics at the same time, they are actually colliding on the same cache line (essentially the lowest possible amounts of bytes a processor core can load from the RAM). It’s kind of similar to how booleans work. Booleans are of course not stored in a single bit in the RAM and then the immediate 32 bit are e.g. an integer. It’s more like one bit is the boolean, then there are 7 dead bits (to make a byte) and then the float is stored on adjacent 4 bytes.
    To avoid this cache line sync problem, you can instruct the compiler to design the struct in a way to add padding between the two atomics so they are not directly adjacent anymore. BUT this is all a lot of micro management and I’m by far no expert. The important message to take from this is probably: atomics are not free of runtime costs even if they are lock free. They are just a lot faster then using mutexes or spin locks. – which takes me to my third point

  • Be sure what you are designing actually fits your use case. There are other ways of passing data to the audio engine apart from communicating via std::atomic that have to be “expensively” read on every callback, even though the rarely ever change. I’m e.g. writing a lot of audio code for live performances. My EQ doesn’t change during performance so I designed my plugin in a way that makes updating the values a little bit more expensive (of course still lock free and you can change them during performance if you want – just might take more time then just reading an atomic… don’t know haven’t actually measured that part) but I know that my default just playing case is as fast as humanly possible without accessing a gin atomic value to play at very low buffer sizes for very fast round trip times.

Thank you for the reply. I was thinking of a more general case, where the number of parameters are not necessarily sum up to 16 bytes, and the CPU architecture isn’t restricted to the one I’m using. In this case Parameters1 won’t seem to work. As to Parameters2 although itself is not atomic, all its members are atomic, so I was guessing it might meet the requirement as long as each parameter is set or get separately? May I clarify this? I mentioned juce::AudioProcessorValueTreeState because to me it works in a way a bit similar to Parameters2, where very often getRawParameterValue(StringRef) is called in the audio thread to access parameters one by one.

Now I understand Parameters2 has this cache line sync problem. But, again, will this be fine if the audio thread will only get and never do set()? May be there is some better strategies to avoid all these issues?

It depends. You may use a FIFO to send data between thread.

If you want to keep things simple, I’ve used atomics with manual synchronization in this way:

class Processor
{
public:
	void setCutoffAndReso(const float c, const float r)
	{
		JUCE_ASSERT_MESSAGE_THREAD;

                
		sCutoff.store(c, std::memory_order_relaxed);
		sReso.store(r, std::memory_order_relaxed);

		flag.store(true, std::memory_order_release);
	}
	void setEnv(const float e)
	{
		JUCE_ASSERT_MESSAGE_THREAD;

		sEnv.store(e, std::memory_order_relaxed);

		flag.store(true, std::memory_order_release);
	}

	void processBlock()
	{
		// Audio thread
		if (flag.exchange(false, std::memory_order_acquire) == true)
		{
			cutoff = sCutoff.load(std::memory_order_relaxed);
			reso = sReso.load(std::memory_order_relaxed);
			env = sEnv.load(std::memory_order_relaxed);
		}
	}

private:
	std::atomic<float> sCutoff;
	std::atomic<float> sReso;
	std::atomic<float> sEnv;
	float cutoff = 1.0f;
	float reso = 0.0f;
	float env = 0.0f;

	std::atomic<bool> flag {false};
};

memory_order_relaxed has the same CPU impact than a normal non-atomic variable but they lack any synchronization. acquire & release pair, has a minimum CPU impact, and with a good sync between the two threads. When an “acquire” is loaded, it guarantees it reads everything has been written until the last “release”.

1 Like