Is std::atomic<float> for x64 systems really required?

I know this is an unpopular question :slight_smile:

I wonder if we not just can leave atomic away for 32-bit float and integer data types if we compile our plugins only for standard 64-bit systems? It looks like they are atomic anyway.

Or can there be a problem when the int and float values are inside a struct when doing something like this and copy it (maybe when the compiler copies the struct bytewise):

juce::Point<float> point = point2;

while another thread modifies point2?

I don’t care if y is the old value and x is the new one. I just wonder if for example floats can get into a corrupt state where they return NaN or something similar because of multithreading.

Any input is welcome.

Tearing is probably the lesser problem like you describe. But the atomic is an important hint to the optimiser. Without the optimiser could falsely assume the value to be constant, because that thread never writes to that variable.

2 Likes

Thanks for the answer. Not sure what the probably means, but I assume tearing is no problem in most scenarios then…
I always thought the compiler does not care about threads and it cares about dependencies. What problems pop up when the compiler falsely assumes that a value is a constant and it changes in runtime?

The probably means I am not expert enough to give a decisive answer but I follow your reasons.
But the other point is more important that the optimiser is allowed to reorder your code and remove stuff it thinks has no effect. And if the thread never reads a variable, it might not be calculated at all.

That’s not something I came up with but see Dave and Fabian at ADC 2019

2 Likes

You’re assuming that if you have float read/writes in your C/C++ code, they actually turn into float read/writes in the assembly.

In practice, the optimizer might switch it to a vectorized operation, or collapse a bunch of functions into one and reorder your instructions. The only way to stop it from doing so and make sure your float read/write is actually just that, is by using an atomic.

More information here:

This also means it can’t optimize code that contains atomics. I see the edge cases in the examples with loops and also with the CPU cache.

A concrete example:

SplinePoint points[100];

// main thread
for (int i = 0; i < 100; i++)
{
    points[i] = otherPoints[i]; 
}

// processing thread
float x = points[i].x;

Can this lead to a problem where I read a 32bit float that is corrupt or does have an undefined value?

We still assume that we run the code on a modern 64-bit system.

Sorry for the questions. I know I should just use std::atomic’s :slight_smile: Just trying to understand…

That’s very true.
That’s why to have good performance you want to have explicit atomic read and writes in your code, before or after you have all kinds of loops in performance-sensitive code.

In audio that usually means reading the parameters before the “real” processing starts.

Yes. For example if the compiler made one or both of these reads/writes into the array vectorized (which it likely will, as it’s an easy optimization), there’s a chance you will have corrupt values in the moments of contention when the two threads touch the data.

If you had an array of atomic points in there, that won’t happen to an individual point (but could happen to the array as a whole).

1 Like

On top of what was said before, you cannot wrap an array into std::atomic. In that case you need to have a scoped lock or a FIFO.

My solution would be a simple fifo and copy the whole vector each time.


using SplinePointData=std::array<SplinePoint, 100>;

template<typename DataType>
struct SplineFifo
{
    constexpr int size = 16;
    std::array<DataType, size> buffer;
    std::atomic<int> readIndex { 0 };
    std::atomic<int> writeIndex { 0 };

    bool isUpToDate () { return readIndex == writeIndex; }

    void push (DataType data)
    {
        auto write = (writeIndex + 1) % size; // using modulo for simplicity
        std::swap (data, buffer [write]);
        writeIndex = write;                   // fine for single producer
    }

    DataType pop()
    {
        if (! isUpToDate())
            readIndex = (readIndex + 1) % size;

        return buffer [readIndex];
    }
};

Then in the using thread you simply read the latest SplinePointData using std::swap to avoid allocations and deallocations.
The only shared data are the indices in the FIFO, and that makes sure they are not read and written at the same time.

EDIT: just added a simple push and pop for illustration, untested

1 Like

This is something I can’t risk.

That’s the reason why I ask. This means I need to read parameters into another struct or class that has the same values but is not atomic before processing the data :grimacing:

This was something I tried to avoid and this is hard when you have a lot of params. But it looks like there is no way around it.

Thanks for your help.

Thanks for sharing that code. I never thought about a FIFO as a solution to solve a multithreading problem. I need to think a bit more about it :slight_smile:

The video I linked above is full of great advice in that respect. I recomend watching it in full, they formalise the problem space very well.

3 Likes

This helped a lot. Thanks.

Some feedback for people that also want to use atomics (maybe refactoring some old code):

The performance impact wasn’t measurable when switching to atomics for all parameters in my case, even when I read some values for every sample.

I had to rewrite some code because you can’t use std::swap for structs or classes on the stack that contain atomic’s.

1 Like