I’ve found yet another thing I don’t understand about the concept of “connected” in an InterprocessConnection object. In this case I’m talking about the client/slave – the side of the pipe that calls connectToPipe. All of this is on Windows.
Consider this sequence of events:
- client process constructs InteprocessConnection object
- server process constructs InteprocessConnection object
- server calls createPipe
- server’s connectionMade method is called (before createPipe returns success)
- client calls connectToPipe with a receive timeout of 1000 msec
- client’s connectionMade method is called (just before connectToPipe returns success)
- client’s read thread calls InterprocessConnection::readNextMessageInt which calls NamedPipe::read
- inside NamedPipe::read, ReadFile returns ERROR_IO_PENDING so the next call is to WaitForMultipleObjects which times out after 1000 msec.
- NamedPipe::read calls CancelIo and WaitForSingleObject. I don’t totally understand why this is necessary. Not sure it’s super relevant here, but CancelIo and WaitForSingleObject succeed.
- NamedPipe::read calls GetOverlappedResult which fails and returns ERROR_OPERATION_ABORTED, presumably due to the CancelIo call.
The key though is that bytesRead never changes after it’s initial assignment to -1. That indicates failure which I’m not sure is correct. Returning 0 seems more correct in this scenario.
InterprocessConnection::readNextMessageInt does interpret the -1 as failure, calls connectionLostInt and returns false, though as far as I can tell, the connection is fine. The master’s handle from CreateNamedPipe is still valid as is the slaves from CreateFile.
Subsequent attempts to write from the master side succeed, but because InterprocessConnection::readNextMessageInt returns false, the read thread exits, so the client doesn’t see them. What’s more strange is that if the slave sends one or more messages, they make it to the master even though the slave doesn’t think it’s connected.
If the slave does call connectToPipe again, I expected the previous file handle to get ditched and a new one created. So calling connectionLostInt makes sense in one way of looking at it. The actual behavior I see is different. When the slave calls connectToPipe, the previous file handle does get ditched, but the subsequent call to CreateFile fails with ERROR_PIPE_BUSY. I think I see a way to fix that that makes sense independent of the rest. Patch coming…
I guess this is all expected behavior given that someone passed a non-forever timeout to connectToPipe. It seems somehow non-ideal to me. For one, the two sides of the pipe have different ideas about whether there’s a connection (see http://www.rawmaterialsoftware.com/viewtopic.php?f=2&t=7579 for some thoughts there). But for another, does anyone really want to use an InterprocessConnection like this? Since the read is happening in a separate thread anyway, does it matter when it arrives? My reflex is to remove the pipeReceiveMessageTimeoutMs argument from connectToPipe…or maybe better, don’t call connectionLostInt if the timeout expires, but also don’t call disconnect at the beginning of connectToPipe so subsequent connections use the existing pipe (and get whatever data is inside it).
Alternately, to stick with the behavior of ditching the connection, let’s really ditch it so the other side knows about it as well and subsequent writes fail. I guess that means another call to disconnectPipe in NamedPipe::read…except at the moment the only place CloseHandle gets called on the client side of the pipe is in ~NamedPipeInternal…so there’s some more juggling to do.
Hope this isn’t too annoying. Thanks again for your help.
-DB