How to detect stuttering?


#1

Hi jules. Is there any way to detect discontinuities (such as stuttering) in AudioFilterStreamingDeviceManager, in case for example there is a CPU overload or similar?


#2

No - stuttering isn’t something you can detect. All you can do is to keep an eye on the AudioDeviceManager’s cpu usage and make sure it doesn’t get too high.


#3

Too bad…:frowning: One would think that this could easily be built into Core Audio or ASIO.

Is there anyone out there who use some clever technique to guess if the audio is stuttering?


#4

Wait, how come you can’t detect stuttering?!

It seems to me that you’re going to start “stuttering” significantly before you get into pathological states where you’re completely out of CPU, and if you’re “stuttering” then at some point you’re trying to take more samples out of a buffer than you put into it, and you can detect this.

Let me be more specific here! In my code, at least, close to the last thing in a signal path is a generic buffered audio source(*) which is filled by some worker thread. If the worker thread is keeping up with real time, there won’t be any stuttering; and if it isn’t keeping up with real time, then there will be a point when you try to take more samples out of the buffer than you put into it.

The only other failure case that I can see would be that the copy from the buffer to the output is CPU-bound - but that copy, which is a) tiny b) occurs at a high thread priority, should never be your bottleneck.

(* - I don’t use Jules’ because it has waaay too much “hair” - for example, it contains its own thread! I have a much simpler one that’s updated by my own thread.)


#5

Oh sure, you can detect buffer-underruns like that… I kind of assumed that the OP was asking about stuttering that happens when the audio process callback takes too long to complete. And for that, there’s not really any way to know whether the hardware has really stuttered or not.


#6

Sorry to grill on this, but these are deterministic processes and, unless there’s some magic hidden inside the system code we can’t get around, we should in principal be able to detect any possible failure.

stuttering that happens when the audio process callback takes too long to complete.

What does this specifically mean? We call into the system, and the result needs to take x ms, but it takes x + delta ms to complete…?

It would be too expensive to look at the system clock, every time, but we could keep track of the cumulative samples processed by the audio process callback and learn pretty fast if we were falling behind real-time, yes? And in fact, we could be making that test in a completely separate thread…

It seems to me that no matter where the issue is, some variant of “count the samples through here” and then taking differences at various points in the signal chain would let you detect stuttering. The beauty of it is that you don’t have to know what the latency or difference is supposed to be between any two points you’re measuring - you simply need to see that that difference is systematically increasing to detect an issue.

(Note that in any buffered system, there will be times when the difference does increase without anything being wrong, because you’re reading down a full buffer and haven’t started to refill it yet. So you need a little code to detect systematic increases, which will depend somewhat on your system. But if you simply want to light a red light on your console, you could just use the rule “two cycles in a row where the difference increases” and be right nearly every time…)


#7

[quote=“TomSwirly”]> stuttering that happens when the audio process callback takes too long to complete.

What does this specifically mean? We call into the system, and the result needs to take x ms, but it takes x + delta ms to complete…?[/quote]

Except that’s not how the audio callback works. Instead, a hardware interrupt calls YOU (the other way around). If you take too long, you get a dropout. The audio device is like a bus. If you aren’t at the bus stop (i.e. have filled the buffer in time) then it leaves without you (audio drop out).

Its not enough to detect stuttering after the fact. A user needs to know when they are even close to stuttering, so they can take appropriate action to reduce the CPU load. For example, shut down some effects or reduce the number of active channels.

The CPU meter in the AudioIODevice api is perfectly suited for this type of feedback.


#8

Unfortunately, even measuring the cpu usage doesn’t really tell you much, because you have no idea what the maximum time would be before it stutters - with some audio devices, it might stutter if you spend more than 50% of the cpu time in the callback. On others you might be fine with over 90%. Some devices might even be able to deal with the occasional slow callback without any problem. Others won’t. There’s no magic answer that will work in all cases.


#9

Except that’s not how the audio callback works. Instead, a hardware interrupt calls YOU (the other way around). If you take too long, you get a dropout.

I think I touched on this case earlier on.

Your audio callback should never, ever do anything other than copy from a fixed buffer, the end. You should never perform any computations at all during that time, or if at all possible, take any locks.

By making sure that your worker threads are running at a lower priority than the audio callback, you should always be able to perform that simple copy unless your CPU is so bound that you would already have detected gross issues like “your mouse not responding”.


#10

Well the user will quickly learn not to go over 50% on the meter…I hope!


#11

Whoa…that makes no sense. I argue that performing the calculation for the fixed buffer either in a separate thread, or in the audio i/o callback itself, are computationally equivalent. In other words, whether you compute the buffer in another thread, or do it in the callback, the results are the same.

Let’s analyze the two cases:

  1. The are sufficient CPU resources, and the other thread has made the fixed buffer available in time

  2. There are insufficient CPU resources, and the fixed buffer is not ready yet (audio callback has to output silence instead, since there’s no data).

For case #1 it clearly doesn’t matter which thread does the calculation, there’s enough resources. For #2, neither approach works due to insufficient resources.

Therefore, we should always prefer to do the calculation inside the audio i/o callback for the simple reason that it avoids the overhead of additional threads and the synchronization required to access the fixed buffer from multiple threads.


#12

I argue that performing the calculation for the fixed buffer either in a separate thread, or in the audio i/o callback itself, are computationally equivalent.

Well, the specifics depend on your threading model, but I’d argue that that’s not true - assuming that your audio i/o callback thread runs at a strictly higher priority than your calculation thread!

Let’s suppose, to be specific, that we only have about half as many cycles as we need to correctly perform whatever operation this is, and let’s for the sake of argument ignore all other threads except the audio I/O thread and the computation thread.

What happens is that the CPU spends all its time chugging through this slow task, but gets about halfway through. At this point, the audio I/O thread is ready to run. It kicks in, and now, because it’s at a higher priority, in a rational world it will run until it’s complete, monopolizing that core, processor, whatever.

And this is only a sample memcopy. It’s hard to imagine what operation the audio I/O callback could be expecting that would be faster than that.

What’s going to happen at that point is that the high priority I/O thread is going to discover that it simply doesn’t have enough samples, so it’ll try to do something clever, but it won’t be able to get the data which isn’t there yet.

But I can’t see any way that some stuttering should be able to sneak by this model unnoticed. As long as your I/O thread is higher priority than any other thread, it should be able to temporarily monopolize the CPU in order to do its little copy without any issue at all.


#13

Ofcourse you can detect it! Just measure the time from last to current audioDeviceCallback using Time::getMillisecondCounterHiRes and make an estimate if it is too long - then you have an overrun. The time measurement has to be done at the very start of the callback before any processing.
A safe “overrun” comparison value would be t > (0.5+blockSize/sampleRate1000). This only works if the callbacks are quite regular. Otherwise you have to adapt the formula a bit, for instance making it t > (1+1.5blockSize/sampleRate*1000) .
I’m not saying that this works for all cases, the best way is to check using a test tone. But it works in a lot of times and I’ve used it here.


#14

Tom: if your app is just streaming audio, and its latency doesn’t matter, then what you’re saying makes sense. But in any application where you’re either processing incoming audio, midi, or other timing data and responding to it in real-time, then the only way to achieve a low latency is to do the work in the audio callback.


#15

zamrate: the issue with that is that getMillisecondCounterHiRes might be extremely expensive to be calling in a tight loop!

You can get the same effect by sample counting and that’s just arithmetic.

Jules: OK, interesting to know that there are real-world cases where there’s non-trivial work in the callback, I think it’s scary and wonder if you couldn’t do a better job with a tiny buffer and small chunks…

But I still think this second case is also detectable.

What’s exactly happening when we run out of CPU for our audio callback? Well, I only see three failure possible modes.

  1. The callback will sometimes or always be called before the previous callback has even finished executing (overlap)
  2. The callback will sometimes or always go off at a later time than scheduled (lag)
  3. Some callbacks simply never occur at all (skip)

Seems to me again that the sample counting strategy above, or a time counting strategy if computing the current time is cheap, will detect cases 2 and 3.

I don’t believe that case 1 really happens, because what thread would actually be calling the callback if the previous thread is busy? But it’s easy to detect anyway - simply set a boolean flag as you start rendering the audio, reset it when you finish. You don’t have to lock on that flag, because it’s OK if you occasionally get it wrong :smiley: because you’ll never report an error where there isn’t one, and while you might miss one overlap that way, you wouldn’t miss it if it were constantly happening.

So I don’t see a way where your code can be invisibly malfunctioning. It seems to me that there are mechanisms only involving a little arithmetic, not much code and no locking that will reliable detect these issues.


#16

I’d say that there are few real-world cases where there’s only trivial work in the callback! Consider a synthesizer, where you do just calculations (i.e. no IO, system calls, memory allocation etc. which all are NO-NO in the callback), then there is nowhere else to do it than in the callback IMNSHO. Trying to delegate it to another thread only adds complexity, and no benefits whatsoever.

Oh, if you’d post the opinion that no complex computations should be done in the callback on the portaudio list, you’d be starting a flame war :slight_smile:

Regards
/Rob


#17

I don’t want to get too diverted here - it was immediately clear to me that I was wrong about being able to avoid all work in the audio callback, and I admitted it - but my point is that we can efficiently detect a shortage of cycles no matter how it’s caused.

Initially, I dismissed doing work in the direct callback out-of-hand. Jules pointed out that that was wrong - so I immediately adapted my proposed strategy to take that into account.

So overall I don’t want to drift away from “detecting stuttering”!

That said… :smiley:

I’d say that there are few real-world cases where there’s only trivial work in the callback! Consider a synthesizer, where you do just calculations (i.e. no IO, system calls, memory allocation etc. which all are NO-NO in the callback), then there is nowhere else to do it than in the callback IMNSHO. Trying to delegate it to another thread only adds complexity, and no benefits whatsoever.

Well, the benefit I’m proposing is “detecting, and conceivably partially compensating for, stutter”, which has at least some value.

Consider the case I’m dealing with, literally in another window on this screen right now. I have a high quality pitch shifter but in order to spit out results sometimes it takes a small amount of incoming data and sometimes it takes a large amount - depending on the state of its internal buffers, which I don’t understand - even though the overall throughput is correct.

I don’t see any way to get smooth performance out of this without putting a small buffer in the way - because otherwise sometimes it will ask for data and there simply won’t be enough input yet. This also means that sometimes it takes more CPU time than others - in this application, it won’t ever run out of CPU time on any reasonably modern machine but this certainly isn’t an absolute truth…!

The only downside - but a potentially big one - is latency. Each sample of buffer is about 23ms of latency.

In the case of the synth, the latency only affects controls (because if the user didn’t send in controllers, we could pre-compute the synth sound as far ahead as we liked). Generally, real world instruments physical instruments take on the rough order of 10ms from the time you actuate a note to the time they first sound, so I consider 1ms latency to be “good” and 10ms latency “acceptable”. (None of my hardware synths responds much faster than 30ms except my Kurzweil, for example.)

10ms of latency is 440 samples (at 44KHz, the lowest “quality” setting - this is the worst case, you get more breathing room at higher sampling rates…). Let’s suppose we had a sample buffer of size 256. You have to be careful with such small buffers - for one thing, you need to make sure that your chunk size is less or equal to than half the size of your buffer or else you won’t be able to interleave reads and writes, which might mean a chunk size of 128 samples.

This is small enough that per chunk overhead might be significant. It hasn’t been in my tests but then mine is a special case that isn’t extremely CPU sensitive.

Having up to three thread context switches every 10ms (which is what you need to be able to hit to do this) doesn’t really seem to me to be unreasonable in a modern machine.

In other words, some real world routines either don’t output consistent amounts of data or use consistent amounts of CPU and you can use a buffer and another thread to smooth that out.

That said, lots of food for thought here and it makes me wonder if I wouldn’t be able to get rid of my buffer entirely in many cases…


#18

I try to get down to 2 or 3 ms buffer size but regardless, Juce already has the CPU detection code in the audio i/o device API. You can just show the user the % utilization in a meter, and let them manage it appropriately.

User responses could be, to increase the buffer size, decrease the sample rate, reduce the number of active voices, reduce the number of effects, or reduce the number of active channels. All of these will cut down on the load (of course, these are application domain-specific).

Remember, the CPU code in the audio i/o callback tries to measure utilization percentage of the audio i/o callback, not overall system CPU usage. As Jules pointed out, it is not a foolproof method. But the user can hear the dropouts if it gets to that point, or they will see the percentage of utilization reach a high level, and take appropriate action.

[color=#800000]In practical terms, I don’t think that it is possible to reliably detect dropouts algorithmically.[/color]


#19

[quote]zamrate: the issue with that is that getMillisecondCounterHiRes might be extremely expensive to be calling in a tight loop!

You can get the same effect by sample counting and that’s just arithmetic.[/quote]

One call to getMillisecondCounterHiRes() won’t ruin your performance! And it’s the only way to effectively do the check. I don’t see what you mean when you say “sample counting”. What is sample counting? And how in the world does this tell you that the driver called the audio callback too late because of some reason (be it too high CPU usage in the previous callback or some OS limitation)?

You can also get stutters not because the CPU usage in the callback is too high, but because some other stuff in the background is happening, causing the driver to miss a callback. I have a soundcard that works perfectly with 49samples latency on my single-core machine, but on the dual core machine, 49 samples leads to dropouts, no matter what I do, even if the CPU usage inside the callback is 0%.


#20

One call to getMillisecondCounterHiRes() won’t ruin your performance!

I have experienced on more than one occasion systems where reading the system clock, particularly the hi-res version, was a moderately expensive operation. And remember we’re talking about calling this in an audio callback, so it goes off hundreds of times a second…

I don’t see what you mean when you say “sample counting”. What is sample counting? And how in the world does this tell you that the driver called the audio callback too late because of some reason (be it too high CPU usage in the previous callback or some OS limitation)?

:frowning:

If you read the previous posts above this one, I describe, I thought, fairly clearly what I meant. I have an even longer version stored which I think I’ll add to the conversation, one moment please…