Race condition when destroying an InterprocessConnection obj


#1

I’ve got as simple a child of InterprocessConnection as I can think of, and my test program crashes on shutdown every time. I’m running on Windows 7.

This is what I do:

  • construct my InterprocessConnection child
  • call createPipe(foo,-1)
  • call sendMessage(010203040506) (i.e. I build a MemoryBlock with 0x0102030405 and call sendMessage with that)
  • sendMessage fails since there’s nothing on the other side of the pipe
  • destroy my InterprocessConnection child

After the call to createPipe, the InterprocessConnection’s thread starts and waits inside NamedPipe::read (eventually at the WaitForMultipleObjects call in NamedPipeInternal::connect).

I’m pretty sure the NamedPipeInternal destructor runs when the InterprocessConnection thread is still using that object. The call stack in the thread that calls the destructor (the main thread) looks something like:

InterprocessConnection::~InterprocessConnection()
InterprocessConnection::disconnect()
NamedPipe::cancelPendingReads
NamedPipe::close

NamedPipe::cancelPendingReads sets an event that WaitForMultipleObjects is waiting on, but then cancelPendingReads immediately returns and NamedPipe::close calls ~NamedPipeInternal. The InterprocessConnection thread may have woken up from WaitForMultipleObjects, but there’s no telling that NamedPipeInternal::connect has finished, nor that the InterprocessConnection thread is really done with the NamedPipeInternal object…At least not as far as I can tell.

I’m not sure the proper way to fix this, but perhaps in InterprocessConnection::disconnect() if the stopThread() call moves in between the call to cancelPendingReads() and close(), that would cover it?

PS The windows implementation of NamedPipe::close also calls cancelPendingReads, so it gets called twice at the moment. Don’t think this is related, but in case it influences a solution…


#2

For what it’s worth, moving the stopThread call as I mentioned does get the InterprocessConnection thread to exit before the NamedPipeInternal gets destroyed, but my test program still crashes, right after it calls CloseHandle(pipeH) in NamedPipeInternal::~NamedPipeInternal. The thread that crashes is the main thread and its stack appears to have been corrupted.

Thanks for your help.

-DB


#3

With the patches from http://www.rawmaterialsoftware.com/viewtopic.php?f=2&t=7564, the patch here is enough to keep this test from crashing.


#4

I spoke too soon. The same test is back to crashing all the time…


#5

I found a way to make the crash go away, but it’s been on-again-off-again as far as reproducing it so I’m not confident this absolutely fixes whatever was wrong. I copied this code: CancelIo (intern->pipeH); WaitForSingleObject (over.hEvent, INFINITE); // makes sure cancel is complete into NamedPipeInternal::connect if the return value from WaitForMultipleObjects there indicates a timeout. This is the same thing that NamedPipe::read does when its call to WaitForMultipleObjects times out.
http://msdn.microsoft.com/en-us/library/aa365683%28v=vs.85%29.aspx says this:[quote]Do not deallocate or modify the OVERLAPPED structure or the data buffer until all asynchronous I/O operations to the file object have been completed.[/quote]which is what the code did before calling CancelIo and waiting. It’s not clear what to do if CancelIo or WaitForSingleObject fails. Simply ignoring it probably isn’t right. http://stackoverflow.com/questions/3950314/what-if-cancelio-fails suggests leaking the OVERLAPPED structure and throwing an exception. Tough to do at the moment since we allocate the OVERLAPPED structure on the stack. Still working on this part.