Performance problem (possibly related to SpinLock)


#1

Hi! I’ve encountered some strange behavior with recent git snapshots of Juce that I’m hoping someone can help me with.
The problem occurs only with Juce since the SpinLock addition from 30 Mar 2011.
What happens is that GUI interactions are extremely slow. Even moving my mouse pointer over an Introjucer window to click on menu options is very laggy, and when I try to type into text inputs there are delays and repeated characters. The same behavior occurs with another program I compiled that depends on Juce. It seems like there’s some issue with the main event loop possibly related to the new SpinLock, but I’m just guessing.
Earlier git snapshots and version 1.52 don’t have this problem. Unfortunately for me, the Juce dependent program that I want to run compiles with the latest Juce from git, but not with the pre-SpinLock Juce.

Here’s some version info in case it’s useful:
$ uname -a
Linux pushpaw 2.6.31-11-rt #154-Ubuntu SMP PREEMPT RT Wed Jun 9 12:28:53 UTC 2010 i686 GNU/Linux

$ g++ --version
g++ (Debian 4.4.5-8) 4.4.5

Please let me know if there’s any other info I can provide to help you help me with this issue.
By the way, I thought it might be related to my RT kernel, but I got the same behavior with a non-RT kernel (2.6.32-5-686).

Thanks a lot!

  • Sam

#2

I can’t reproduce this at all, the introjucer works perfectly for me in my 2.6.35 ubuntu. And I can’t think of any reason why the spinlocks would have an effect like that, especially in the introjucer, which is pretty-much single-threaded!

Seems more likely that something else I’ve changed around the same time might have caused the problem, but I can’t think of anything I’ve done that could be related… If you want to double-check that it’s SpinLocks that are the culprit, an easy test would be to comment-out the SpinLock class, and replace it with a “typedef CriticalSection SpinLock;”


#3

Jules,
I tried your suggestion (and also had to comment out the two SpinLock methods at the bottom of juce_Thread.cpp to get IntroJucer to compile), and that “fixed” my problem.
No more laggy responsiveness in IntroJucer, and I was able to build and comfortably run the program I’m interested in that depends on Juce (Ctrlr).

I wonder what the underlying problem is, and if it might affect other Linux users. Let me know if there’s any other info I can provide that might help you pinpoint the problem.
In the meantime, I’m happy that I can run the app I wanted.

Thanks for your help,
Sam


#4

Well, that’s the most baffling thing I’ve heard in a very long time…!

The introjucer doesn’t use any threads, so there’s only one place in the whole codebase where a spinlock actually might be contended, and that’s in the Timer code. But Spinlocks are the same on all platforms, so if they didn’t work, the same problem should also happen on my version of linux, OSX, windows, etc etc. Not to mention the fact that spinlocks are incredibly simple, and really don’t have much scope for going wrong! The only conceivable problem I can think of is that if your kernel is some kind of special RT build, then maybe the Thread::yield() method behaves in a peculiar way… but even then it’s hard to imagine a code-path that’d slow things down significantly.

Since I can’t reproduce it, would you be willing to do a few experiments for me, to try to pinpoint the cause?


#5

Might, or might not be helpful, but with sched_yield on posix system there is no guarantee another thread will start when it’s rescheduled, since it’s a process-wise operation (man page):

This depends on the selected scheduler on his kernel (and it might or might not cause kernel’s scheduling to be called: http://kerneltrap.org/Linux/CFS_and_sched_yield )
So, basically, if you need to use spinlock in userspace, then the best solution, under Posix system, is to use pthread_spinlock_t primitive, instead of your own Atomic variable (but in this case, beware of priority inversion that’s not guaranteed for spinlocks, that is, when Thread 1 of higher priority waits for a spinlock from a Thread 2 with a lower priority, as soon as the Thread 2 is scheduled, the process deadlocks since thread 1 will always be scheduled (higher priority), and as such will consume all the CPU).


#6

Oh, and by the way, pthread_mutex are futex in reality, so they already try spinning on contention first for few loops, so there is no gain in working your own spinlock on posix system.
I wonder what is the use of such spinlock at all, since under Windows, CRITICALSECTION are also futex, and perform spinning-busy wait on their own too… (I don’t know on Mac, but it seems it’s a unfortunate optimisation)


#7

Thanks cyril! The main reason I added it is because there are a few places where having a re-entrant or well-scheduled lock is just overkill, so a really simple lock that can be initialised without calling any OS functions, and which is only 4 bytes in size seems like a more efficient solution.

I’ve actually only used them in a very few places, where the code only performs really tiny amounts of work in the locked sections, and for most purposes, I’ve stuck with normal CriticalSections. True, I could probably use a pthreads primitive to do the job on unix, but would be surprised if it does anything significantly different from simply doing a compare-and-set.


#8

The main difference with an own-made spinlock is that, on contention, instead of doing your own “Thread::yield” and hope that the next selected thread will be the contender, the pthread’s version calls futex_wake() syscall, which instruct the kernel to wake up the contender (so it goes from statistically correct code to deterministically correct code). pthread version also guarantee FIFO behaviour for the wait, so if there is 1000’s thread waiting, no thread is going to take more than a 1/1000 of the cake.

That said, you can also implement your own too, instead of Thread::yield() using a unique, static, WaitableEvent which is wait()'ed for on contention (you’d loose the thread fairness option of futexes, but for 2 threads at max, it’s likely enough).


#9

Yeah, but the main point was really to keep the class itself minimal, (i.e. just a single atomic integer), so that its initialisation, locking and unlocking operations would be pretty-much optimised down to single inline op-codes. As I said, I wouldn’t intend it to be used in places where multiple threads might be queuing for attention, just in situations where there’s a very rare race condition that needs to be guarded against with as little overhead as possible.


#10

In that case, that can’t be the reason why the OP see so much difference. One way to figure out where the CPU is leaking, is to break the software inside GDB and get some stack trace (poor man profiling).


#11

Yes, that’s why I’m so puzzled by the OP’s problem. I can only find one spinlock that could possibly ever block in the introjucer, and that’s in the Timer code, where it only protects a few instructions. Very odd.


#12

Jules,
If you have any particular experiments in mind, let me know and I’ll be happy to do them.

Otherwise I’ll post back if I figure out anything useful. Also, if no one else has the problem, it may just be some weird quirk with my particular system. At least I have a workaround.

Thanks for the help, and for creating such a nice library/toolkit! Now that I’m aware of it, I see that there are a quite a few cool apps out there that use it.


#13

Thanks! In the latest version, I’ve put a “typedef SpinLock LockType;” inside juce_Timer.cpp class - by changing that typedef to a CriticalSection, it’d be possible to see if that’s the lock that’s causing the problem (I really can’t see any others that might be causing trouble). If it is that one, then I’ve still no idea at all why it’d go wrong, but at least it’d narrow it down a bit!