InteprocessConnection::connectionLost and read timeout

I’ve found yet another thing I don’t understand about the concept of “connected” in an InterprocessConnection object. In this case I’m talking about the client/slave – the side of the pipe that calls connectToPipe. All of this is on Windows.

Consider this sequence of events:

  • client process constructs InteprocessConnection object
  • server process constructs InteprocessConnection object
  • server calls createPipe
  • server’s connectionMade method is called (before createPipe returns success)
  • client calls connectToPipe with a receive timeout of 1000 msec
  • client’s connectionMade method is called (just before connectToPipe returns success)
  • client’s read thread calls InterprocessConnection::readNextMessageInt which calls NamedPipe::read
  • inside NamedPipe::read, ReadFile returns ERROR_IO_PENDING so the next call is to WaitForMultipleObjects which times out after 1000 msec.
  • NamedPipe::read calls CancelIo and WaitForSingleObject. I don’t totally understand why this is necessary. Not sure it’s super relevant here, but CancelIo and WaitForSingleObject succeed.
  • NamedPipe::read calls GetOverlappedResult which fails and returns ERROR_OPERATION_ABORTED, presumably due to the CancelIo call.

The key though is that bytesRead never changes after it’s initial assignment to -1. That indicates failure which I’m not sure is correct. Returning 0 seems more correct in this scenario.

InterprocessConnection::readNextMessageInt does interpret the -1 as failure, calls connectionLostInt and returns false, though as far as I can tell, the connection is fine. The master’s handle from CreateNamedPipe is still valid as is the slaves from CreateFile.

Subsequent attempts to write from the master side succeed, but because InterprocessConnection::readNextMessageInt returns false, the read thread exits, so the client doesn’t see them. What’s more strange is that if the slave sends one or more messages, they make it to the master even though the slave doesn’t think it’s connected.

If the slave does call connectToPipe again, I expected the previous file handle to get ditched and a new one created. So calling connectionLostInt makes sense in one way of looking at it. The actual behavior I see is different. When the slave calls connectToPipe, the previous file handle does get ditched, but the subsequent call to CreateFile fails with ERROR_PIPE_BUSY. I think I see a way to fix that that makes sense independent of the rest. Patch coming…

I guess this is all expected behavior given that someone passed a non-forever timeout to connectToPipe. It seems somehow non-ideal to me. For one, the two sides of the pipe have different ideas about whether there’s a connection (see http://www.rawmaterialsoftware.com/viewtopic.php?f=2&t=7579 for some thoughts there). But for another, does anyone really want to use an InterprocessConnection like this? Since the read is happening in a separate thread anyway, does it matter when it arrives? My reflex is to remove the pipeReceiveMessageTimeoutMs argument from connectToPipe…or maybe better, don’t call connectionLostInt if the timeout expires, but also don’t call disconnect at the beginning of connectToPipe so subsequent connections use the existing pipe (and get whatever data is inside it).

Alternately, to stick with the behavior of ditching the connection, let’s really ditch it so the other side knows about it as well and subsequent writes fail. I guess that means another call to disconnectPipe in NamedPipe::read…except at the moment the only place CloseHandle gets called on the client side of the pipe is in ~NamedPipeInternal…so there’s some more juggling to do.

Hope this isn’t too annoying. Thanks again for your help.

-DB

So much for patch coming. Try and try as I might, I can’t seem to get a slave to reconnect to the master’s pipe. The master calls DisconnectNamedPipe but nothing I can come up with change what happens on the slave. It’s call to CreateFile returns ERROR_PIPE_BUSY. That’s not quite true. If the master calls CloseHandle then the slave’s call to CreateFile returns ERROR_FILE_NOT_FOUND.

I saw the code related to WaitNamedPipe here (http://msdn.microsoft.com/en-us/library/aa365592(v=vs.85).aspx), but that doesn’t help. Or at least I still get ERROR_PIPE_BUSY even after waiting for over a minute. It still sort of feels like code that like should be in NamedPipeInternal somewhere though the consequences (having NamedPipe::openExisting and InterprocessConnection::connectToPipe block) are not ideal. Perhaps some kind of connectionTimeout param is the way to go.

I’d love to hear if the existing code works for someone, or if there’s some other patch to make it possible for connectToPipe to work after the read thread has gone away and the server has disconnected from its end.

-DB

It was staring me in the face all along. The master’s read thread was exiting with bytesRead == -1. All we need to do is set waitAgain to true after calling disconnectPipe in NamedPipe::read.

This diff is on top of the one here: http://www.rawmaterialsoftware.com/viewtopic.php?f=2&t=7564&p=42709#p42711

The test I wrote passed after adding just the first waitAgain = true. I think adding the second one is correct as well. Writing a test to prove that looks pretty tough though. I think a slave would have to disconnect in a very small window of time between when the master’s read thread returns from NamedPipe::connect and when it calls ReadFile…or something.

[code]diff --git a/src/native/windows/juce_win32_Files.cpp b/src/native/windows/juce_win32_Files.cpp
index e694e0d…e4a0c62 100644
— a/src/native/windows/juce_win32_Files.cpp
+++ b/src/native/windows/juce_win32_Files.cpp
@@ -865,11 +865,13 @@ int NamedPipe::read (void* destBuffer, int maxBytesToRead, int timeOutMillisecon
else if (GetLastError() == ERROR_BROKEN_PIPE && intern->isPipe)
{
intern->disconnectPipe();

  •                waitAgain = true;
               }
           }
           else if (((lastError == ERROR_BROKEN_PIPE) || (lastError == ERROR_PIPE_NOT_CONNECTED)) && intern->isPipe)
           {
               intern->disconnectPipe();
    
  •            waitAgain = true;
           }
           else if (intern->isPipe)
           {
    

[/code]

Cool, thanks - I’ll have a look at that.

I found a scenario where the WaitNamedPipe stuff is important.

  • master calls createPipe
  • slave calls connectToPipe
  • slave calls connectToPipe again
  • connectToPipe calls disconnect which ends up calling NamedPipe::close which eventually calls the NamedPipeInternal destructor where the slave’s handle to the pipe is closed (with CloseHandle).

What I’ve seen mostly from here is that the master’s receive thread wakes up, notices that the pipe is gone, calls DisconnectNamedPipe and then ConnectNamedPipe and everything’s fine…

But if the master process doesn’t run and the slave process continues, the next step in connectToPipe is to construct a new NamedPipe and call NamedPipe::openInternal so we get to the NamedPipeInternal constructor before the master has changed anything on its side…and there the call to CreateFile fails with ERROR_PIPE_BUSY.

I suppose this could be OK if there’s an expectation for some higher level code to have retry logic, but it makes sense to me to handle it right there.

With the WaitNamedPipe stuff in the NamedPipeInternal constructor, there are enough differences between the master and slave side (the big one being that the slave now needs a timeout) that it feels right to have the common stuff in a base class NamedPipeInternal and then two child classes NamedPipeMaster and NamedPipeSlave to handle the differences. I haven’t written this yet but in particular I have a feeling we’ll end up with some part of NamedPipe::read handled by methods specific to one or the other child class and the error handling, etc. is going to get easier to understand and with any luck more reliable.

The only API changes I see are:

  • NamedPipe::openExisting gets a timeout
  • InterprocessConnection::connectToPipe gets a connectionTimeout

Thoughts?