Consider this sequence of events on a ThreadPool with one thread:
ThreadPool::addJob successfully adds a job to the array of jobs and starts the thread
before ThreadPool::runNextJob gets to run, someone calls ThreadPoolJob::signalJobShouldExit on the just-added job
when ThreadPool::runNextJob gets to run, the first loop where it iterates to find a job doesn’t come up with anything because job->shouldStop is true
So far this seems fine, the job never runs. I don’t see that it ever gets removed from the jobs array but perhaps that’s not so bad.
The struggle comes when someone calls ThreadPool::waitForJobToFinish waiting for the aborted job to finish, potentially waiting forever. jobFinishedSignal is never signalled so it hangs.
I don’t see a particularly easy way around this…or at least one I can guarantee is less racy than what the code currently does. Any ideas, or at least confirmation that I’ve found a hole?
Umm…I haven’t looked at the ThreadPoolJob code but one solution is to change job identifiers from a small integer into a reference counted object pointer. This way a job persists until the last reference is gone. So after a job is finished or gets signaled to exit, it can be put into a state suitable for inspection (i.e. usable in a call to waitForJobToFinish).
I believe that is the solution used by boost::thread and std::thread.
The ThreadPool class needed a bit of a spring-clean… I’ve been through and tidied it up now, hopefully sorting out this problem in the process. Let me know if you still have any trouble with it!
That method no longer has a deleteInactive argument. Each job now knows whether it should be deleted, so there’s no need to tell it what to do. (God knows why I didn’t write it that way in the first place!)