macOS Round Trip Latency

I sent you a direct message.

This has now been fixed on develop with commits a8a03427 (input/output device is the same device) and 98e0ee75 (input/output device is a different device using the same clock signal).

Some notes:

  1. The reported latency/timestamps by the operating system isn’t super accurate. You can expect the DemoRunner’s AudioLatencyDemo to report a corrected latency of <±2ms.
  2. It’s typically higher for USB devices (<±5ms) as USB class audio devices have no way to report their internal latencies to the OS. Some non-pro USB devices I’ve tested had even higher latencies (<±10ms).
  3. Nevertheless, the reported latencies will now be constant and should vary only very little (<±1ms) from test to test and should only depend very little (<±1ms) on the buffer sizes.
  4. If you are using different audio devices for input and output, be sure that they are using the same clock signal. The above fixes will not account for clock drift and you will get randomly varying corrected latencies in the AudioLatencyDemo. Solving this would require resampling the audio.
  5. Last but not least, there still seems to be some issue with Apple MacBook Pro/Air internal mics/speakers (Line In/Out is fine). On my M1, I get a constant corrected latency of 42 ms (independent of sample rate and buffer size). I think this might be an Apple bug and I’m in the midst of filing a bug to Apple to understand why the timestamps seem to be reporting the latency incorrectly. Funnily enough, when creating an aggregate device between the mics and speakers, the latency is correctly accounted for, so some part of the OS must know the correct latency.
6 Likes
1 Like

Hi @fr810,

From commit a8a03427 I see that you use the following formula for device latency:

deviceLatency + safetyOffset + framesInBuffer + streamLatency

Have you confirmed it as correct?

My understanding of how latency is reported is the following:

  • Safety offset is reported by a device driver to account for hardware clock drift/jitter, low-level FIFO buffers, or any other variability – usually small number of samples around 70 or so. This number tends to be an approximation.
  • Device latency is any added latency for the overall device that impacts all streams. Perhaps some DSP or algorithm latency.
  • Stream latency is the same as device latency but specific to a stream to account for algorithms that are run on that stream only. Caveat here is that latency is per stream so something to consider because I honestly don’t know how JUCE deals with multiple streams per device.

These numbers should be reported for both directions. In the end it doesn’t matter where they are reported (which property is used) because it is the job of the client to sum them all up.

  • Buffer size is a bit more interesting because it refers to the I/O cycle buffer size and it impacts both input and output – how long the HAL will buffer the input until the IOProc (app) is called and how much time it is given for the IOProc to process the output thus defining the presentation time of the samples produced (how far in the future output will be written).

Therefore an example loopback app would have a nominal latency of two buffer sizes plus all the device and stream latency for both directions.

That being said, not much can be done if underlying components don’t report accurately. :confused:

1 Like

Hi @fft,

Thank you for your insights. This is super interesting.

Have you confirmed it as correct?

The measured latencies agreed with the calculated latencies for the devices I tested (Motu 8A via USB, Motu 8A via AVB, minidsp MCHStreamer and a cheapo no-brand stereo in/out audio interface from Amazon) within a margin of error (see my post). So it’s hard to say if my latency calculation is incorrect or the error is due to other sources (devices misreporting, USB etc.).

Therefore an example loopback app would have a nominal latency of two buffer sizes plus all the device and stream latency for both directions.

Isn’t this exactly what JUCE does? The loopback demo will sum getOutputLatencyInSamples and getInputLatencyInSamples. Each of these methods get their latencies with the formula you posted:

deviceLatency + safetyOffset + framesInBuffer + streamLatency

i.e. the total latency will be two buffer sizes plus all the device and stream latency for both directions.

Note that the latency is only calculated this way when the output/input is using the same device. If they are separate devices, then we don’t even need to query any of the above latency properties. The AudioIODeviceCombiner uses a FIFO, and the audio callback timestamps delivered to us by the OS, to ensure that a certain target latency is always met. Of course, this target latency needs to be higher than actual latencies + a safety offset to account for the fact that input/output devices will not start at the exact same time.

Isn’t this exactly what JUCE does? The loopback demo will sum getOutputLatencyInSamples and getInputLatencyInSamples . Each of these methods get their latencies with the formula you posted:

If you do that per CoreAudio stream then it is correct, otherwise If JUCE exposes all streams at once then a max() of all the streams latency would be probably the best way to go. That’s one of the challenges of dealing with different implementations of audio system because they all abstract these components differently.

If they are separate devices, then we don’t even need to query any of the above latency properties. The AudioIODeviceCombiner uses a FIFO, and the audio callback timestamps delivered to us by the OS, to ensure that a certain target latency is always met. Of course, this target latency needs to be higher than actual latencies + a safety offset to account for the fact that input/output devices will not start at the exact same time.

That’s right, you can wait for a callback and inspect the input and output AudioTimeStamp for each stream and combine them.

It is worth noting that CoreAudio has the concept of an AggregateAudioDevice which combines streams from different devices (even if they run at different clock domains) and provides the same AudioDevice API for clients, thus the latency properties can be normally queried. The CoreAudio HAL will automatically handle drift correction between audio devices by time tracking and ASRCing with proper ratio while avoiding FIFO to synchronize the samples between different devices because it has random access to the underlying audio buffers. In my view AggregateAudioDevice is one the the great advantages of CoreAudio over any other OS audio server.

2 Likes