I recently came across an assert in ~ReferenceCountedObject() that said the reference count of the object was not zero. Initially my assumption was that something was deleting the object manually rather than through the reference counting, but further investigation seems to suggest that this is caused by a race condition and ReferenceCountedObject is not thread safe.
The check for the reference count being zero and the deletion of the object is not an atomic operation so there is a brief window of opportunity where a second thread can obtain a reference and increment the reference count. The first thread then deletes the object and the second thread goes on to reference a deleted object.
Below Iāve pasted an abstract piece of code that demonstrates the issue. This was the one and only piece of code in a command-line app generated in the Introducer using V5.4.5 of JUCE. In my tests, this generally only took a few seconds before the assert was fired, although in one case it took about 35 seconds. As is the nature of thread-related issues, itās a bit random.
There is a globally accessible object that contains data required by two different threads. One is just reading the data, the other is updating the data. To make sure that the writer doesnāt update the data when the reader is using it, the writer creates a new object and then assigns it to the globally accessible pointer when all the data is set up. The reader thread takes a reference to the global data at the start of its reading cycle and then refers to that rather then the global pointer.
The intention is that the writer only makes new data available when it is fully formed, and the reader keeps a ālockā on the data to make sure it doesnāt go away whilst itās using it.
There are plenty of ways of working around this particular issue, but my main point of concern is that the comment for the ReferenceCountedObject class suggests itās thread safe, and it appears that it might not be.
Any thoughts?
Charles
#include "../JuceLibraryCode/JuceHeader.h"
#include <iostream>
class ReferencedThing : public ReferenceCountedObject
{
public:
typedef ReferenceCountedObjectPtr< ReferencedThing > Ptr;
int someData[1000];
};
ReferencedThing::Ptr sharedPointer;
class ReaderThread : public Thread
{
public:
ReaderThread() : Thread( "ReaderThread" ) {}
virtual void run()
{
while ( true )
{
ReferencedThing::Ptr safePointer = sharedPointer;
if ( safePointer )
{
for ( auto& d : safePointer->someData )
{
std::cout << d;
}
std::cout << "\n";
}
}
}
};
class WriterThread : public Thread
{
public:
WriterThread() : Thread( "WriterThread" ) {}
virtual void run()
{
while ( true )
{
ReferencedThing::Ptr newThing = new ReferencedThing;
int i=0;
for ( auto& d : newThing->someData )
{
d = i++;
}
sharedPointer = newThing;
}
}
};
int main (int /*argc*/, char* /*argv*/[])
{
ReaderThread r;
WriterThread w;
r.startThread();
w.startThread();
while ( true )
{
Thread::sleep( 1000 );
}
return 0;
}
Looking at the boost source, it looks like their intrusive pointer does almost exactly the same thing as the juce implementation, so I wonder whether that suffers from the same problem. I canāt see any obvious mitigation to this issue, so I suspect it doesā¦
Actually, looking a bit closer, I think this is a bug in your code. Updating the refcount on a RefCountedObject is thread safe, but modifying RefCountedObjectPtr is not, i.e. reading/writing to a RefCountedObjectPtr from multiple threads simultaneously will be a data race. This is the same as the situation with std::shared_ptr, where updating refcounts from multiple threads simultaneously is safe, but updating a single shared_ptr instance from multiple threads will race. Thatās why atomic<shared_ptr<T>> exists (even though this isnāt guaranteed to be lock-free).
Can you try replacing the sharedPointer object in your example with something like this, and see whether the problem persists?
Thanks @reuk, your suggestion works as expected. In our case Iām going to switch to using a lock-less CAS exchange loop as suggested here to avoid using locks. Itās not an absolute requirement as the threads accessing this data are the message thread and the rendering thread so locking isnāt an issue, but Iād like to get into the habit of not using locks unless absolutely required.
Iād like to suggest changing the description of the ReferenceCountedObject class. The current text which reads:
This class uses an Atomic value to hold the reference count, so that
the pointers can be passed between threads safely.
Currently, this suggests that the usage I defined above should work.
Iām afraid that both are unsafe. If the line
ref_ptr_to_object_accessed_by_multiple_threads = ref_ptr_generated_by_this_thread
is called in multiple threads, it is possible that both of them will try to delete the object in the shared pointer (provided that is the last reference).
In the line
ref_ptr_generated_by_this_thread = ref_ptr_to_object_accessed_by_multiple_threads
you can be given a pointer to an object that has just been deleted in another thread. To be completely safe, any assignment or fetching of a reference needs to be in some kind of locked section.
As I mentioned above Iāve ended up changing the way my code works so that only a single thread is responsible for the lifetime of the object (using a scoped pointer rather than a reference count). Other (time critical) threads can block the changing of the object using a CAS loop.
Hmmmā¦ Jules has said in other posts (admittedly old ones) that using ReferenceCountedObjectPtr across threads should be fine. Is this information incorrect, or is there a difference in the implementation?
The only behaviour thatās guaranteed to be thread-safe is adjusting the reference count of the pointed-to object. If you have two RefCountedObjectPtr instances a and b both pointing to the same object, itās safe to delete a on one thread while making a copy of b (the RefCountedPtr itself, not the pointed-to object!) on another thread. For pretty much anything else, youāll need some external synchronisation.
In particular, if you have a single RefCountedObjectPtr instance which is accessed by multiple threads, youāll need additional synchronisation. Similarly, if you access a pointed-to object from multiple threads simultaneously, youāll need additional synchronisation.
Not really, since that pointer is a reference, hence the reference count cannot become zero. Obviously if you hold other references/pointers, that are not reference counted you are back to square one.
It is safe if the references die by natural causes, i.e. going out of scope. Deleting a reference (reset()) can never be thread safe as pointed out.
Thatās in line with @reukās answer above, the reference count is atomic and can only become zero, once no ReferenceCountedObject::Ptr is in any scope (and then there is nothing to draw a copy from).
@daniel I donāt think thatās correct, which is what led to my original post. Admittedly, the chances of this happening are quite rare, but relying on something being rare when it comes to pointers and multi-threading will almost always lead to a difficult to reproduce crash .
If you try creating a console application in the Producer and then paste the example code in the original post youāll be able to see this in action. It might take a while for it to go pop, but eventually (within about 30 seconds in my tests) the ReaderThread will manage to get hold of a reference counted pointer to the shared object just as the WriterThread tries to delete it.
If Iāve misunderstood whatās happening here and you have a different explanation Iām happy to hear it.
As I said above, the problem in your original example is that ReferencedThing::Ptr sharedPointer; is a single object being read/written from multiple threads simultaneously. There is a data race present in this example, but itās on the ReferencedType* referencedObject; data member of the ReferenceCountedObjectPtr.
To be clear: itās safe to adjust the reference-count of the pointed-to object from multiple threads simultaneously without synchronisation. That is, unique/separate ReferenceCountedObjectPtr instances which all point to the same ReferenceCountedObject can be constructed/destructed from multiple threads without the ReferenceCountedObject dangling.
All other kinds of simultaneous modification are unsafe without additional synchronisation. Reading/writing the same ReferenceCountedObjectPtr from multiple threads without synch is unsafe (thatās what the first example was doing). Reading/writing fields other than the refcount in the pointed-to object from multiple threads without synch is unsafe.
@reuk, I think I have understood all this. To summarize,
If a thread has possession of a ReferenceCountedObjectPtr, it can safely make a copy of itā¦
But it cannot safely acquire a ReferenceCountedObjectPtr from another thread without a lock.
If a ReferenceCountedObject term is going to be accessed by multiple threads, you need a lock.
Does the following code seem correct then? (The critical lines are marked **** and ++++; thatās where I need confirmation).
#include "../JuceLibraryCode/JuceHeader.h"
#include <thread>
#include <chrono>
#include <iostream>
class C : public ReferenceCountedObject
{
public:
using Ptr = ReferenceCountedObjectPtr<C>;
C(int i_) : i(i_) {}
void update(int u)
{
const ScopedWriteLock l (lock); // If this lock weren't here, there might be a race over int i.
i = u;
}
int get() const
{
const ScopedReadLock l(lock); // If this lock weren't here, there might be a race over int i.
return i;
}
private:
int i;
ReadWriteLock lock;
};
class WrapC
{
public:
WrapC(C::Ptr p) : c(p) {}
C::Ptr getC()
{
CriticalSection cs; // ***** If this lock weren't here, you might get a dangling pointer over c if two threads tried to access it at the same time.
return c;
}
private:
C::Ptr c;
};
void startThread(C::Ptr c, int i)
{
while (true)
{
auto copyOfRefPtr = c; // ++++ Now that the thread has safely acquired c, it can safely make a copy of it.
c->update(c->get() + i);
std::cout << c->get() << " ";
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
}
int main(int /*argc*/, char* /*argv*/[])
{
auto wrapC = WrapC(new C(0));
std::thread thread1 (startThread, wrapC.getC(), 1);
std::thread thread2 (startThread, wrapC.getC(), -1);
while(true)
std::this_thread::sleep_for(std::chrono::milliseconds(10));
return 0;
}
I fail to understand, how an object would be deleted while acquiring a new ReferenceCountedObject::Ptr. The object can only be deleted when a scope ends and the very last Ptr goes out of scope. So there is no other Ptr where you could draw a new Ptr from.
Also when a new Ptr is created, first the copy constructor takes place which increments the atomic ref count, before the Ptr it is copied from might go out of scope.
The problem is not having multiple Ptr on separate threads, but that two threads access the same Ptr in global scope. That cannot be safe and is completely independent of the ReferenceCount.
I donāt know if you could wrap the global Ptr itself into std::atomic to make the assignment safe. Sounds weird, but might be possible.
The problem is in the call to decReferenceCountWithoutDeleting inside the function decIfNotNull which is called by the destructor of ReferenceCountedObjectPtr. The result of the function determines if the object should be deleted or not, but there is a tiny window between that function call and the object being deleted where another thread can jump in and take a reference. This can only happen if there is a single reference counted pointer accessible from multiple threads. This isnāt necessarily in āglobal scopeā - we have a few situations where the main processor in a plug-in spawns another thread for processing non-time-critical lengthy calculations but it shares data with the main processor class for reporting back results.
If @cblessing is right, then this does seem like a problem. (And the rarer the problem, the worse it is).
Would it be possible to fix this in the copy constructor of ReferenceCountedObjectPtr ? If youāre trying to copy a Ptr thatās about to be deleted by another thread, its reference count should be 0. In this case, the copy constructor can return a null ReferenceCountedObjectPtr.
Currently, youāre allowed to copy a ReferenceCountedObjectPtr with count 0, as the following code demonstrates.
DynamicObject::Ptr p = new DynamicObject();
p->decReferenceCountWithoutDeleting(); // simulating another thread decreasing the ref count
auto q = p;
q->getProperties(); // or whatever. No crash at this point...
// ... but big crash when it goes out of scope
Calling decReferenceCountWithoutDeleting() defeats the purpose of having a reference count in the first place. It is like cheating with Schrƶdingers cat: I am not looking at you, but I am looking at you.
I know that, Iām just trying to simulate what might be happening in the multi-threaded case that weāve been talking about. A thread would encounter a Ptr that has ref count 0, but has not yet been deleted.
Iāve edited the comments in my code so hopefully itās a bit clearer.
I think thereās some confusion about what threading scenarios are safe when using RefCountedObjectPtr.
If you have two completely separate RefCountedObjectPtr instances a and b pointing to the same object, itās safe to call any method on a from thread 1 while calling any method on b from thread 2. Note that Iām talking about methods on the RefCounteObjectPtr itself, like reset or operator=.
It would be unsafe to call any method of a from thread 1 while simultaneously calling any non-const method on a from thread 2. Put another way, if multiple threads of execution access the same RefCountedObjectPtr without synchronization and any of those accesses uses a non-const member function of RefCountedObjectPtr then a data race will occur.
With that in mind, letās consider how it might be possible for the following to happen:
If the refcount reaches zero, that means there is only a single RefCountedObjectPtr pointing to the owned RefCountedObject at that point. For another thread to ājump inā, this would have to happen during the call to decIfNotNull, called by reset, operator=, or ~ReferenceCountedObjectPtr. All of these are non-const member functions, so this would be a data race under the clause above. That is, the only way this could happen is if the program itself is ill-formed.
If you want to access the same RefCountedObjectPtr instance from multiple threads, you need external synchronisation of some kind, like a CriticalSection which is locked whenever the RefCountedObjectPtr is accessed. If you use external synchronisation, it will be impossible for another thread to ājump inā and attempt to access a deleted object.