I have to disagree to this one. On both my M1 Max MacBook and my Intel x84, I can squeeze a struct with four floats (so 4x4 bytes) into a lock free atomic.
@kamedin you can check that by adding static_assert(std::atomic<DataIndex>::is_always_lock_free); to your code. This fails at compile time, when you are trying to build for a platform that does indeed insert a mutex to assure atomic behaviour.
What you have to keep in mind is, that this is a single producer / single consumer structure. Only one thread is allowed to call pull (consume) at a time and only one thread is allowed to call push (produce) at a time. If you already know that this will be the message and audio thread, I would add asserts to protected you from not thinking about this when e.g. adding a thread pool or multi threading your rendering engine. To incorporate that nicely in your software design, you could subclass ValueSender and “override” both methods with an added assert. This little overhead could also be easily excluded from your release builds by hiding the subclass behind an #if JUCE_DEBUG and forwarding the subclass name to your original ValueSender with a typedef or using statement if JUCE_DEBUG is false.
if constexpr (std::is_same<T, juce::AudioBuffer<float>>::value ): this would also be a great place to use the c++20 requires clause: void prepare(int numSamples, int numChannels) requires (std::is_same<T, juce::AudioBuffer<float>>::value). This way, the compiler prevents you from calling prepare when it really doesn’t make any sense and you probably didn’t what to call it.
So you don’t want your code to be portable to 32-bit Intel? Limitations like this should be clearly identified in the comments.
Do you know what ‘false sharing’ is? It’s when both your read index and write index are allocated within the same cache-line. What this does is cause extra unnecessary latency (CPU load) while the processor cores sync up both indexes every time your write to one of them.
segregating the indexes looks like:
#ifdef _WIN32 // Apple don't support this yet
alignas(std::hardware_destructive_interference_size) std::atomic<int> read_ptr;
alignas(std::hardware_destructive_interference_size) std::atomic<int> m_committed_write_ptr;
#else
[[maybe_unused]]char cachelinePad[64 - sizeof(std::atomic<int>)];
std::atomic<int> read_ptr;
[[maybe_unused]]char cachelinePad2[64 - sizeof(std::atomic<int>)];
std::atomic<int> m_committed_write_ptr;
#endif
My point really is not to be critical just for the sake of it, but to emphasise that: Opensource FIFO examples are just a Google away. Writing your own is likely to result in incorrect or low-perfomance code.
Intel 32-bit, is the most common one that I have to support.
Note that thread-sanitizer detects race-conditions, it won’t detect that this code is not lock-free on all platforms.
The biggest red-flag though is the atomic ‘bool fresh’. Because a simple FIFO requires two atomics only: the read index and the write index. ‘fresh’ appears to be redundant, and it also is the cause of the 32-bit incompatibility. What does it achieve?
I guess you didn’t intend to tag me. Just in case, my example is indeed spsc. The ready index is the only atomic (read and write indexes are accessed by a single thread), and it includes the changed flag in its third bit.
Good to know! I’m personally not planning to ship a 32 bit pro audio product for end users, it seems like it would be a major support overhead unless I specifically develop for that platform.
But I guess if I’ll ever use a machine like that for that purpose I’ll make sure to install the 64 OS on it, thanks!
Hey @matkatmusic , thanks for the info! However I still don’t understand why you need a FIFO here? You are only sharing the last object that was written on the GUI thread with the audio thread. Why don’t you atomically publish that one value to the audio thread? A FIFO is needed if you have a sequence of objects that are being pumped from one thread to another, but in this case there is no sequence, just a single object that is periodically updated.
You are literally doing this in the audio thread:
while( backgroundObjectCreator.pull(t) ) { ; }
which is discarding all objects except the last one written. Why have a FIFO then? What am I missing?
The example I showed is doing that.
But that is not how I’m actually using it in my projects
I use the backgroundObjectCreator on the audio thread.
I split up the processBlock into chunks of 16 or 32 samples and request a new object from the backgroundObjectCreator for those smaller chunks.
This means the backgroundObjectCreator is being requested to create an object multiple times every processBlock. That’s why the FIFO has multiple objects that need pulling after the backgroundObjectCreator creates each element that is requested.
It’s probably overkill, but I did it for the purpose of smoothing the changes in values being used by the object creator.
If you want to talk more about it, send me a DM. I’m sure the design could be improved/optimized but it works well enough for the needs of the project.