Deadlock due to Alsa thread priority


#1

Well,

I’m using an old version of the Juce lib, so it might be fixed since, but anyway, can you confirm:
I’ve a thread running with a RR priority, but lower than Alsa’s thread priority. In my thread I’m calling “sourcePlayer.setSource(0);” to remove the audio source.
Rarely, I get a deadlock here, since my thread, having a lower priority can’t take the lock, since, as soon as the audio thread releases the lock, it’s allowed to run again (higher priority), and my thread never get a chance to get it.
The only entry point for the audio thread is getNextAudioBlock, but the lock is taken while calling this method (obviously).

Do you have an idea, about how to solve this ?


#2

Are you sure that’s what’s happening? I’ve never known a situation where a high-priority thread is gaining/releasing a lock so much that no other threads could ever get it… Perhaps this is because your device driver is using some kind of insanely high-priority realtime thread, but even so it would only be a problem if your audio processing is really strainging the CPU to its limit.

If that’s the case, then a workaround might be to have a bool flag somewhere that will turn off your audio processing, so relieving the CPU load and letting your other thread get a chance?


#3

The audio driver is alsa from juce’s codebase. I think it’s running in time critical priority (the highest possible on this system).
Yes the application is using a lot of CPU (to be exact 85% CPU on the first core, and 65% on the second core).
Anyway, I’m not sure I’m following what you meant by “I’ve never known a situation where a high-priority thread is gaining/releasing a lock so much that no other threads could ever get it…”.
If a thread has a higher priority and it’s not waiting for anything else, no other thread with lower priority can be scheduled (at least, I’m 100% sure it’s the case under linux).
Simple example, start a RR thread and do this:
while(1);
And your system is dead
[i]To be honest, there is a protection against this under linux to prevent a RR thread from owning the system, but you can disable it to have a true posix system:

cat /proc/sys/kernel/sched_rt_period_us > /proc/sys/kernel/sched_rt_runtime_us

[/i]
Or, maybe you’re saying the alsa driver does call blocking system call, so the other thread should be allowed to run ?


#4

The actual thread isn’t created by my code - it comes from the ASIO audio driver, so its priority and behaviour may be different with different soundcards.

What I meant was that I’ve never known a driver to be stupid enough to just spin its audio thread without yielding - if the audio callback is eating 100% cpu then a real-time thread will bring the machine to a standstill, so any sensible ASIO driver would spot this and yield in between calls, which would almost certainly let your other thread get the lock.


#5

Doh… sorry, just realised we’re talking about entirely different things… ASIO/ALSA… easy to mis-read (!)

Ok, in ALSA it does look like it should be doing some kind of yield or sleep inside the thread loop to avoid hogging the CPU. I’ll take a look…


#6

Are you sure that the ALSA thread has real-time scheduling ? I recall asking Jules to add SCHED_RR scheduling to the ALSA audio thread but as far as I know it is not in the juce git tree, and has never been. But I’m using sched_rr on the alsa audio thread and midi thread, and have never run into deadlocks, even when hammering the cpu with a 100% audio load.

I think the solution to your issue is to use the same SCHED_RR priority on your auxiliary thread and on the audio thread, that’s the only way that will allow you to avoid cpu-starving one of them when the other reclaims all cpu time, because that the way the round-robin scheduling works: cpu is allocated first to the threads of the highest realtime-prio, in a round robin manner, then if none of them reclaims more cpu, the threads of lower real-time prio are allowed to run, etc, until normal “non-realtime” threads.


#7

100% sure. However, the thread priority code was modified in my version and was included almost identically in the last juce git, because previously, it didn’t set up SCHED_RR but only the default SCHED_OTHER (so priority were ignored).
I can check this, in “top”, press “H” and the audio thread is having “-99” in the priority column.

Yes you’re right, this works perfectly.
However, this means that all the thread that need to communicate/interact with the audio thread must be RR same priority. That doesn’t sound clean to me.

Looking at the code, it does call “snd_pcm_wait (outputDevice->handle, 2000)” with a timeout, so I guess it’s sleeping during that time, and this means there’s something wrong with my code, I’ll need to check.