InterProcessConnection Zombies?


#1

Wow. It seems I have managed to create IPC zombies and now wonder how that could have happened. The IPC claims to be still alive (connected) although the other end’s process is no longer existing. The problem is that the remaining endpoint does not notice that the connection was lost and will not reconnect when the other end is restarted.

On Mac OS X, netstat is showing this:

Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp4 35 0 127.0.0.1.8888 127.0.0.1.49160 ESTABLISHED tcp4 0 0 127.0.0.1.49160 127.0.0.1.8888 ESTABLISHED tcp4 0 0 *.8888 *.* LISTEN

How could that be possible if the InterProcessConnectionServer process (LISTEN) is no longer running? Doesn’t the OS do some hard cleanup, even when a process crashed? I am not even sure the process crashed at all. It also seems to happen after the InterProcessConnectionServer was shut down properly (although I am not 100% sure).

Would it be wise to implement a heartbeat ping and rely on that instead of the connection state?

Any hint is appreciated. This jams my setup on a regular basis :frowning:


#2

Now I implemented the heartbeat, which seems to work fine. BUT. What’s even more strange is that, after forcing a disconnect due to the loss of the heartbeat, an attempt to reconnect succeeds, even though the server is non-existent (a background thread attempts a reconnect automatically when offline). Heck. This is beyond me.

Could it be the client connects to itself through the loopback interface? I thought this was impossible (no listening socket), but I might miss something.

Any idea?


#3

Is really nobody else out there experiencing this loopback interface anomaly? :?:


#4

Update: I seem to have solved this, at least it did not occur anymore since I fixed the shutdown procedure of the server.

My fault was to close all connections first, and then stop the InterProcessConnectionServer. Of course it must go the other way round, because (a) the connections are not part of the server anyway, and (b) if clients keep trying to reconnect during the shutdown, things can get messed up.

In other words: “First close the door, then throw everyone out the window and then cleanup your room” :wink:

Hopefully this can be helpful for others.


#5

Despite the above helped with cleaning the shutdown, I am still getting these zombies on a daily basis. :frowning:

It seems to me that TCP connections via 127.0.0.1 are very unstable. To deal with this, I implemented a heartbeat sent out from the server to all connected clients in regular intervals. If a client misses the heartbeat for a certain period of time (three times the heartbeat duration), it assumes the connection is dead, disconnects and attempts a reconnect. This works fine when I deliberately halt the server (pause in debugger).

It does not work in a practical situation however. Unbelievable, but it really happens: The client gets a successful new connect even when the server process is no longer running. This is BEYOND ME (sigh). BTW: Same on Mac and Windows.

Question: How reliable is the Juce code with respect to noticing when a connection is dead? Is connectionLost() a safe indicator? Or can a connection die without Juce getting notice?

Question: Are pipes any better?

TIA


#6

Ok, bringing this to a conclusion, I should share my findings as to what seemed to have fixed this at last. I was able to work around this nightmare on Mac OS X by making sure the client processes are not a child process of the server and are launched by launchd exclusively, that is, by using “open” from a shell. These krank artifacts then do no longer appear. Why? It is still beyond me.

This is probably very important to everyone who’s using IPC on a Mac, so I hope it will be helpful!


#7

Wow! Nasty and obscure! Any unix gurus know why that might be the case?