Linux: 100% CPU usage in messaging


#1

Hi!

When I start programs compiled with latest git tip, after giving the GUI some input (clicking some buttons, etc), the programs start using nearly 100% CPU time. This happens with my own programs, and also with the Juce Demo and The Jucer. I checked with the sysprof profiler: it looks like most of the time is spent in message dispatching code, see attached files.

I checked the git logs and found something that looked suspicious to me: some file descriptors are set to non-blocking mode in juce_linux_Messaging.cpp, then juce_sleepUntilEvent() tries to wait on those fds with select(); but this select call will immediately return without waiting because they’re non-blocking.

I tried this simple change:

[code]diff --git a/src/native/linux/juce_linux_Messaging.cpp b/src/native/linux/juce_linux_Messaging.cpp
index 22beb09…fa45ef4 100644
— a/src/native/linux/juce_linux_Messaging.cpp
+++ b/src/native/linux/juce_linux_Messaging.cpp
@@ -66,8 +66,8 @@ public:
int ret = ::socketpair (AF_LOCAL, SOCK_STREAM, 0, fd);
(void) ret; jassert (ret == 0);

  •    setNonBlocking (fd[0]);
    
  •    setNonBlocking (fd[1]);
    

+// setNonBlocking (fd[0]);
+// setNonBlocking (fd[1]);
}

 ~InternalMessageQueue()[/code]

…and it fixes the problem for me. Everything runs perfectly fine when the setNonBlocking calls are commented out.


#2

Removing those lines will definitely break it - I didn’t put them in without a good reason… I’ll have to do some profiling and see if I can figure out at what point it goes wrong - sounds like it’s working ok until the queue gets too full or something.


#3

Well if it’s breaking it, it’s not breaking it in an obvious way. So far, everything runs fine here with setNonBlocking removed. Adding some printfs in certain places also makes the cpu hogging “mysteriously” go away though. It could be a race condition or something like that. I’m testing on a dual-core cpu.


#4

Hmm… you know, I think you might be right. I’m a bit confused now about why I put it in there - I think it was because the queue was getting too full and the whole thing was blocking, but I also added a maximum queue size to take care of that, and the non-blocking flag doesn’t seem necessary any more. I’ll try taking it out, and let me know if anyone hits any mysterious lock-ups!


#5

To add to the confusion, I searched a bit and now I think I was wrong. :slight_smile: Apparently, select() should block if there’s nothing to read, even if the fd is non-blocking. See for example this: http://cboard.cprogramming.com/linux-programming/109694-select-o_nonblock.html or http://www.scottklement.com/rpg/socktut/nonblocking.html, which says…

So maybe something else is wrong.

Anyway. I’m not quite sure if I understand your messaging code completely, but from what I do understand I don’t see why the fds must be nonblocking. If you agree on that, and if removing the nonblocking mode fixes the problem for you too, then I guess it’s best to leave it out, for now…

When the problem occurs, select returns immediately (0 milliseconds time spent, measured with Time::getMillisecondCounter), with ret=1 and errno=0. So it appears that something is ready to be read, but it isn’t. Increasing maxBytesInSocketQueue to 10000 didn’t help.


#6

oh no, now I’m even more confused! It’s a very confusing bit of code, that one!

I do agree that I can’t see why it’d need to be non-blocking, so will leave that out for now, and see how it goes…


#7

lol… okay :slight_smile:


#8

cool, you checked it in. btw, typo:
…/…/src/native/linux/juce_linux_Windowing.cpp:2315: error: ‘windowStyleFlags’ was not declared in this scope
should be styleFlags, i guess.


#9

drat! Thanks, will tidy that up…