Variable payloads and lockless FIFO use

I’d like to pass data (messages) between threads in a lockless way. I’ve dabbled with ‘low contention’ schemes, such as checking for access to a more normal queue (objects with scoped, owned or refcounted pointers to payload objects) in ‘try once then come back later’ way, but I’d really like actual lockfree access, the same way I can with a fixed object and the FIFO type (which I already do).

Is there any pointer safe way to pass a variable payload? Obviously a raw pointer would work, but I’d rather be safe.

Does that question make sense? I know Vinn knows what I’m asking.

Bruce

A queue of ref-counted pointers should do the job.

Your question makes perfect sense, and Jules’ answer is the correct one. As soon as you use thread queues for your concurrency model, reference counted objects become a requirement for lifetime management.

If you are you using an unbounded lock-free or wait-free fifo, you will either need a 64-bit CAS (CompareAndSwap) or some kind of garbage collection scheme to prevent the ABA problem (ABA problem - Wikipedia). Because implementing these systems is complex and error-prone, I recommend you avoid the lock-free implementation at first, and go with a simpler thread queue that uses a CriticalSection. This will let you build your reference counted framework and deal with the inevitable off-by-one bugs on the reference count without also dealing with the headache of a lock-free queue. If the profiler shows that the queue operations are a bottleneck, then you can replace it with a lock-free implementation.

I DESPISE these types of implementations. In my experience they only make sense in specialized applications. For example, a server that uses asynchronous i/o. Never a desktop app.

I’ve been using a queue of refcountedptrs in an array, and it works well, but you’re saying I can safely put them into the Juce lock-free FIFO? I thought I wouldn’t be able to do that safely.

If it’s safe to do - great. I would almost propose that it should be a juce class in that case - a producer/consumer message queue of some kind.

Bruce

I DESPISE these types of implementations. In my experience they only make sense in specialized applications. For example, a server that uses asynchronous i/o. Never a desktop app.[/quote]

I hadn’t used them enough to develop a strong opinion. But I had a thread (OpenGL drawing) that I really needed to not interrupt at the the start, especially since the incoming messages are relatively non-time critical. So I was checking if the message queue was busy, and then after my time critical work - if needed - coming back and picking up messages for sure. It was the worst case where the high priority thread would be blocked by the low priority thread, so getting unblocked could involve a lot of other variable threads getting thread time.

Made sense to me. I think I already moved that to lock-free.

Still - I know you have a thread-queue that you always use. I presume that’s a single in, single out setup? Is it superior to a lock-free, refcountedPtr version?

BTW - I’m somewhat familiar with the problems with lock-free - I did all the research and wasn’t able to do my own implementation in any reasonable time scale - gave it up. I’m trusting that Jules’ FIFO has avoided all the pitfalls.

Bruce

Queues are FIFO yes. Is it superior? That question isn’t really relevant. Whether or not you put reference counted objects into a thread queue, shouldn’t matter to the implementation. If you look at my thread queue in DspFilters you see that what goes into the thread queue are functors. Using that technique, the thread queue implementation is completely isolated from the decision of what to put into it. So actually my posted thread queue code already handles “variable payloads”.

Lock-free bounded queues avoid all the problems. Its when you implement unbounded queues that you enter a world of hurt.

[quote=“TheVinn”] If you look at my thread queue in DspFilters you see that what goes into the thread queue are functors. Using that technique, the thread queue implementation is completely isolated from the decision of what to put into it. So actually my posted thread queue code already handles “variable payloads”.

Lock-free bounded queues avoid all the problems. Its when you implement unbounded queues that you enter a world of hurt.[/quote]

Yes, I remember you were using functors. To me, that wouldn’t solve anything - the called function (if it were to access the data to be shared) would have as many potential blocking problems as the original problem.

In my case, I’m packing a copy of the data (I’m already doing this, it’s just using array criticalsections now) and operating on that. UI change > new copy for the thread to process. Works OK. It might not fit so well with Juce Values, but I like it for now.

And yes, since I ‘pick up’ my messages fairly often, and each is the latest current data, a limited size queue is fine.

Bruce

Not really. If your functor is the result of a boost::bind() to a function that takes a ReferenceCountedObjectPtr as a parameter then you get lifetime management “for free”. My existing ThreadQueue handles this case.

This is also the technique I use. Each thread has its own copy of the data. Thread queues are used to propagate changes. Yes, there are moments in time when a thread might be out of date but since audio applications typically process in blocks (in the audio i/o callback) this is acceptable. Values cannot change in the middle of a block.

I have consolidated my thoughts on the use of thread queues with functors and variable payloads with reference counted objects by writing a documented example of a simple but common use-case which compiles. You can read up on it here:

Thread safe reference counted message passing CODE!

Your question makes perfect sense, and Jules’ answer is the correct one. As soon as you use thread queues for your concurrency model, reference counted objects become a requirement for lifetime management.[/quote]

Yes, maybe, but not always. Consider the case where the processing end is in an audio callback. If the ref count goes to zero when you’re done with the ref counted object, you’ll implicitly incur a delete within the audio callback. This must not happen. Then it is better to have variable length object allocations on a preallocated queue.

Programmers in general often make the mistake of thinking in low level, implementation terms. In my opinion, this is a mistake. What matters is the design (this is where Jules shines). The optimization comes later. You are showing a case of “premature optimization.”

Let’s look at juce::ReferenceCountedObject

class JUCE_API  ReferenceCountedObject
{
public:
    //...
    inline void decReferenceCount() noexcept
    {
        jassert (getReferenceCount() > 0);

        if (--refCount == 0)
            delete this;
    }
    //...
};

Now lets look at my “SharedObject”, which is an identical copy of juce::ReferenceCountedObject, with one small change:

class SharedObject : NonCopyable
{
public:
  //..
  inline void decReferenceCount() noexcept
  {
    vfassert (m_refs.is_signaled ());

    if (m_refs.release ())
      destroySharedObject ();
  }

  // default implementation performs the delete on a separate,
  // provided thread that cleans up after itself on exit.
  virtual void destroySharedObject ();
  //..
};

Instead of calling ‘delete this’ when the last reference goes away, SharedObject calls a virtual function, whose default implementation puts the shared object onto the thread queue of singleton whose whole purpose is to perform ‘delete this’ on its own thread. The audio thread will never incur the burden of delete, when it gives up the last reference of a SharedObject.

Note how the problem you brought up (which was a legitimate complaint) is solved without changing the interface or usage of the thread queue.

Although the ThreadQueue implementation provided in DspFilters does not use preallocation (although it does support variable length objects implicitly since it uses arbitrary functors), my own private optimized implementation does. Its is mostly wait-free (sometimes lock-free) and after a small warm up period never allocates or frees memory. It has been optimized to where it outperforms the Intel’s “Thread Building Blocks” queue. And yet the interface is exactly the same as the one in DspFilters ThreadQueue.h

The moral of the story is to implement your code using a proper, robust, and well thought API, and worry about the optimization later. I woud love to take credit for this nugget of programming advice but well…you know this one is older than me.

My solution uses RefCountedObjects with very little to them. When they destruct, they pass back the heavyweight object (a video frame, or OpenGL texture and buffer set) back to a pool. So only a very small amount of memory is de-allocated.

But my original question was centered around wanting to pass copies of XML objects. Creation would be on a non-critical thread, but indeed, destruction would be on a ‘real-time’ thread. To be honest, I wasn’t thinking de-allocating memory, in the grand scheme of things, was a big deal?

Bruce

It is, unfortunately. It is a system call, and as such it is non-deterministic. Nominally, non-determinism cannot be allowed in a real-time thread, else you will risk not meeting the real-time deadline.