Juce plugins cause realtime kernel lockup


#1

Hey Jules and all Juce friends,

I noticed when using Juce plugins in a realtime kernel, the system will eventually lockup (after around ~5secs of usage).
I experienced this a long time ago, in a 2.6.33-rt kernel, but since I don’t run a realtime kernel for some time, I was unware if this issue was fixed or not.

Some days ago, 2 IRC users reported to me that they had experienced lockups right after opening a juce based plugin (happened during testing of my juce-lv2 wrapper code).
Recently a friend of mine also experienced this, and the lockup -> hard-reset was very problematic to him (happened with VST versions this time).

I’m not sure what’s causing this, but I can already guess will probably be very problematic to debug (and test too, since it locks the system, so we need a hard-reboot to re-test).
If anyone has some idea of what is the cause of this, please say so. I’m afraid I can’t publish any juce-lv2 plugins if this issue remains.


#2

Just a wild guess, could it be a signaling floating-point exception?


#3

There’s one thing I know for sure - it happens even outside the audio thread.
When we start generating lv2 *.ttl files, it needs to initialize the plugin to get it’s info. The kernel will hang/lock when changing programs.
(as reported to me by an user, complaining that it was not possible to compile/generate the lv2 plugins in a rt kernel)


#4

you could try to check all calls to pthread_setshedparam , if it is not a kernel bug then maybe it could be a realtime prio thread that spins forever ?


#5

+1, it definitively looks like one of your thread is using a realtime priority and not giving/waiting on any other.
Easy to test through, just run your software without root priviledge, you shouldn’t be able to use RT or FIFO scheduler in that case.


#6

The problem is that a linux realtime kernel is a required setup for some users. The realtime kernel has been here for a long time, and this sort of issues are not very common (ie, system lockup just by initializing the plugin and chaning presets).
normal users do have access to RT and FIFO, and that’s intentional. we want to be able to fully use our system resources, so I need to assume that plugins are safe to use.
In this case, they’re are not, so currently juce plugins are just unusable on some linux systems.

I’m still not sure what causes this (I have an idea, but I want to make sure it’s correct before telling about it).
I’ll do more testing and report back once I got a definitive answer.


#7

Ok, another test then, try on a linux box with no RT kernel, change:
/proc/sys/kernel/sched_rt_runtime_us
to be the same as
/proc/sys/kernel/sched_rt_period_us
and run the software as root.

If it freeze, it’s clearly a RT thread that’s not giving up its timeslice.
You can figure out which one by running a shell with the highest priority and send a SIGSEGV signal to your frozen process, in order to get a core file.
Start gdb with that core file and get the backtrace.


#8

I did some more testing, and it seems to me that what is locking the system is not the audio stuff, but the GUI side of things.
The plugin runs nicely, but when I open the UI things started to change. Closing the UI and re-opening it is usually enough to cause the lockup.
When I run the standalone version, the application locks after showing and closing the device settings dialog.

I made a simple screencast/video running a clean VM (ArchLinux 32bit), with realtime on (linux-rt kernel and JACK in realtime mode).
http://kxstudio.sourceforge.net/tmp/juce-lockup.mkv

EDIT: The source code for this specific plugin is here:


#9

Instead of (try to) killing the app (and freeze the kernel), trigger and attach a debugger to figure out where it is.
It’s hard to guess what you’re doing, and I don’t want to study all your code.


#10

The kill is just a way to trigger the lockup faster. If we wait long enough (and click some buttons while at it), it will freeze the kernel.

You missed the point here, this is not about my plugin, it’s about juce. The test plugin I put there was the simplest one (6 parameters, no presets, no special stuff there).

The lockup will happen with the official JuceDemo application (using untouched juce-1.53.zip code, just adjusted the makefile to add needed links ‘-ldl -lXext’).
I even tried the loomer plugins (http://www.loomer.co.uk/), and they also lock the system eventually.
renoise is the exception here (lots of UI updates and no kernel lockup). I don’t know how they do it…
EDIT: Pianoteq also doesn’t lock the kernel.


#11

There is no point in convincing me Juce is wrong, I already trust you.
Please do what I ask, that is, once the application seems frozen, and you still have a hand on the terminal, fire up gdb, attach to the application, break it, and post a backtrace for all threads.
That’s the only way we could figure out the issue in Juce, by understanding what thread does what (and what could be wrong).


#12

Unless you register a signal handler, unhandled signals almost all defaults to core dumping, so it would not crash the kernel.


#13

[quote=“X-Ryl669”]Please do what I ask, that is, once the application seems frozen, and you still have a hand on the terminal, fire up gdb, attach to the application, break it, and post a backtrace for all threads.
That’s the only way we could figure out the issue in Juce, by understanding what thread does what (and what could be wrong).[/quote]

I did that, or at least tried.

  • The JuceDemo app gets frozen very quickly (the first page is the rendering demo)
  • using ‘ps -e’ to get all process pids does not work when JuceDemo is frozen (locks the system half way through the list, probably just before the Juce thread info)
  • gdb gets stuck at ‘attaching to application xyz…’ (same thing when trying to attach to process after gdb started)

#14

You should run gdb in the max RT priority. Else, it can’t ptrace the software.


#15

And so I did, and got it to lock while still on gdb:

After that point the system would still react to the mouse and keyboard, but I couldn’t do aynthing in gdb. a Ctrl+C (trying to close gdb) did the final lockup.


#16

Ok, it seems it’s a bug in your kernel (hence the kernel oops).
You’ll probably find a complete stack trace in your kernel log.
Basically Juce is doing weird things that the kernel doesn’t like and oops on it.
The only information that’s worth any interest is the “task_blocks_on_rt_mutex” and “rt_spin_lock_slowlock” happening almost at the same time, which show a deadlock in the kernel for both task.
Since you can’t debug the kernel state with gdb, I don’t see how you could solve this, (unless you decide to run your kernel with kgdb, but it’s crazy), except maybe by posting the stack you’ll find in your kernel log to the RT patch maintainer list.


#17

Hi,

I’ve been experiencing this problem recently on my RT kernel too. Mostly with my own compiled Juce apps, so I assumed I maybe had a bad library somewhere. I have also been using Pianoteq 3 a lot without any problems, which further convinced me it was my gcc environment. However I have just recently downloaded Pianoteq 4 and bang - kernel crashes!

This is really a major issue - absolutely nothing else on my system ever causes a kernel crash, but now all modern Juce apps that I have do, I expect this will escalate dramatically as Pianoteq 4 rolls out - I see people already posting about having problems with it.

Is there any progress on this?

Jules ?

Running Gentoo 3.0.9-rt25 (x86_64 AMD Phenom™ II X6 1055T Processor AuthenticAMD GNU/Linux)


#18

I’ve absolutely no idea what you’d have to do to crash the kernel, even if you were trying to!


#19

To be frank - neither do I.

The kernel is pretty good at stopping you from doing anything that bad.

Is there anything I can do to help with resolving this - other than a photo of my screen I can’t send much debug info from Pianoteq - it’s a true black screen of death with a few register variables dumped - haven’t seen this sort of thing since the 90’s.

It doesn’t seem to occur at any particular point in time either - the app starts, I can configure it, play a few keys and it all sounds good (I use Jack for the sound), then at a random point - Bang your dead!


#20

I really don’t know what to suggest… There are probably experienced kernel people who could at least give some ideas about what the app may be doing to cause that to happen, but I’ve no idea how you’d debug something like that!