InteprocessConnection::connectionLost and read timeout

dbyron · July 27, 2011, 12:46am

I’ve found yet another thing I don’t understand about the concept of “connected” in an InterprocessConnection object. In this case I’m talking about the client/slave – the side of the pipe that calls connectToPipe. All of this is on Windows.

Consider this sequence of events:

client process constructs InteprocessConnection object
server process constructs InteprocessConnection object
server calls createPipe
server’s connectionMade method is called (before createPipe returns success)
client calls connectToPipe with a receive timeout of 1000 msec
client’s connectionMade method is called (just before connectToPipe returns success)
client’s read thread calls InterprocessConnection::readNextMessageInt which calls NamedPipe::read
inside NamedPipe::read, ReadFile returns ERROR_IO_PENDING so the next call is to WaitForMultipleObjects which times out after 1000 msec.
NamedPipe::read calls CancelIo and WaitForSingleObject. I don’t totally understand why this is necessary. Not sure it’s super relevant here, but CancelIo and WaitForSingleObject succeed.
NamedPipe::read calls GetOverlappedResult which fails and returns ERROR_OPERATION_ABORTED, presumably due to the CancelIo call.

The key though is that bytesRead never changes after it’s initial assignment to -1. That indicates failure which I’m not sure is correct. Returning 0 seems more correct in this scenario.

InterprocessConnection::readNextMessageInt does interpret the -1 as failure, calls connectionLostInt and returns false, though as far as I can tell, the connection is fine. The master’s handle from CreateNamedPipe is still valid as is the slaves from CreateFile.

Subsequent attempts to write from the master side succeed, but because InterprocessConnection::readNextMessageInt returns false, the read thread exits, so the client doesn’t see them. What’s more strange is that if the slave sends one or more messages, they make it to the master even though the slave doesn’t think it’s connected.

If the slave does call connectToPipe again, I expected the previous file handle to get ditched and a new one created. So calling connectionLostInt makes sense in one way of looking at it. The actual behavior I see is different. When the slave calls connectToPipe, the previous file handle does get ditched, but the subsequent call to CreateFile fails with ERROR_PIPE_BUSY. I think I see a way to fix that that makes sense independent of the rest. Patch coming…

I guess this is all expected behavior given that someone passed a non-forever timeout to connectToPipe. It seems somehow non-ideal to me. For one, the two sides of the pipe have different ideas about whether there’s a connection (see http://www.rawmaterialsoftware.com/viewtopic.php?f=2&t=7579 for some thoughts there). But for another, does anyone really want to use an InterprocessConnection like this? Since the read is happening in a separate thread anyway, does it matter when it arrives? My reflex is to remove the pipeReceiveMessageTimeoutMs argument from connectToPipe…or maybe better, don’t call connectionLostInt if the timeout expires, but also don’t call disconnect at the beginning of connectToPipe so subsequent connections use the existing pipe (and get whatever data is inside it).

Alternately, to stick with the behavior of ditching the connection, let’s really ditch it so the other side knows about it as well and subsequent writes fail. I guess that means another call to disconnectPipe in NamedPipe::read…except at the moment the only place CloseHandle gets called on the client side of the pipe is in ~NamedPipeInternal…so there’s some more juggling to do.

Hope this isn’t too annoying. Thanks again for your help.

-DB

dbyron · July 27, 2011, 11:14pm

So much for patch coming. Try and try as I might, I can’t seem to get a slave to reconnect to the master’s pipe. The master calls DisconnectNamedPipe but nothing I can come up with change what happens on the slave. It’s call to CreateFile returns ERROR_PIPE_BUSY. That’s not quite true. If the master calls CloseHandle then the slave’s call to CreateFile returns ERROR_FILE_NOT_FOUND.

I saw the code related to WaitNamedPipe here (http://msdn.microsoft.com/en-us/library/aa365592(v=vs.85).aspx), but that doesn’t help. Or at least I still get ERROR_PIPE_BUSY even after waiting for over a minute. It still sort of feels like code that like should be in NamedPipeInternal somewhere though the consequences (having NamedPipe::openExisting and InterprocessConnection::connectToPipe block) are not ideal. Perhaps some kind of connectionTimeout param is the way to go.

I’d love to hear if the existing code works for someone, or if there’s some other patch to make it possible for connectToPipe to work after the read thread has gone away and the server has disconnected from its end.

-DB

dbyron · July 28, 2011, 4:16pm

It was staring me in the face all along. The master’s read thread was exiting with bytesRead == -1. All we need to do is set waitAgain to true after calling disconnectPipe in NamedPipe::read.

This diff is on top of the one here: http://www.rawmaterialsoftware.com/viewtopic.php?f=2&t=7564&p=42709#p42711

The test I wrote passed after adding just the first waitAgain = true. I think adding the second one is correct as well. Writing a test to prove that looks pretty tough though. I think a slave would have to disconnect in a very small window of time between when the master’s read thread returns from NamedPipe::connect and when it calls ReadFile…or something.

[code]diff --git a/src/native/windows/juce_win32_Files.cpp b/src/native/windows/juce_win32_Files.cpp
index e694e0d…e4a0c62 100644
— a/src/native/windows/juce_win32_Files.cpp
+++ b/src/native/windows/juce_win32_Files.cpp
@@ -865,11 +865,13 @@ int NamedPipe::read (void* destBuffer, int maxBytesToRead, int timeOutMillisecon
else if (GetLastError() == ERROR_BROKEN_PIPE && intern->isPipe)
{
intern->disconnectPipe();

               waitAgain = true;
           }
       }
       else if (((lastError == ERROR_BROKEN_PIPE) || (lastError == ERROR_PIPE_NOT_CONNECTED)) && intern->isPipe)
       {
           intern->disconnectPipe();

           waitAgain = true;
       }
       else if (intern->isPipe)
       {

[/code]

jules · July 28, 2011, 4:41pm

Cool, thanks - I’ll have a look at that.

dbyron · July 29, 2011, 11:50pm

I found a scenario where the WaitNamedPipe stuff is important.

master calls createPipe
slave calls connectToPipe
slave calls connectToPipe again
connectToPipe calls disconnect which ends up calling NamedPipe::close which eventually calls the NamedPipeInternal destructor where the slave’s handle to the pipe is closed (with CloseHandle).

What I’ve seen mostly from here is that the master’s receive thread wakes up, notices that the pipe is gone, calls DisconnectNamedPipe and then ConnectNamedPipe and everything’s fine…

But if the master process doesn’t run and the slave process continues, the next step in connectToPipe is to construct a new NamedPipe and call NamedPipe::openInternal so we get to the NamedPipeInternal constructor before the master has changed anything on its side…and there the call to CreateFile fails with ERROR_PIPE_BUSY.

I suppose this could be OK if there’s an expectation for some higher level code to have retry logic, but it makes sense to me to handle it right there.

With the WaitNamedPipe stuff in the NamedPipeInternal constructor, there are enough differences between the master and slave side (the big one being that the slave now needs a timeout) that it feels right to have the common stuff in a base class NamedPipeInternal and then two child classes NamedPipeMaster and NamedPipeSlave to handle the differences. I haven’t written this yet but in particular I have a feeling we’ll end up with some part of NamedPipe::read handled by methods specific to one or the other child class and the error handling, etc. is going to get easier to understand and with any luck more reliable.

The only API changes I see are:

NamedPipe::openExisting gets a timeout
InterprocessConnection::connectToPipe gets a connectionTimeout

Thoughts?

Topic		Replies	Views
Another way to hang NamedPipe::read on Windows General JUCE discussion	2	934	July 25, 2011
InterprocessConnection::disconnect() crashes app General JUCE discussion	8	577	March 20, 2013
connectionLost is immediatly triggered Windows	10	723	December 29, 2012
Race condition when destroying an InterprocessConnection obj General JUCE discussion	4	584	July 31, 2011
Infinite Loop in NamedPipe::read? General JUCE discussion	0	498	April 23, 2018

InteprocessConnection::connectionLost and read timeout

Purchase

Discover

Learn

Support

About

Events

InteprocessConnection::connectionLost and read timeout

Related topics

Purchase

Discover

Learn

Support

About

Events