macOS Round Trip Latency

I know this is an old topic but it’s not been addressed and I think has some resolutions as mentioned in this thread:

As @fr810 is back around for a bit and is the expert in all things timing, any chance that you could add the check for kAudioDevicePropertyLatency?

I think the idea is that the latency detector demo in the examples dir should return 0 (or very close to it) as the device should be capable of reporting its entire latency (and we shouldn’t need to run round-trip detectors to add any additional latency).

Thanks in advance :pray:

5 Likes

Yes please!

And whilst digging around in there, any such distinguished expert could perhaps consider this too?

So I tried to add up all the latencies together and the latency detector demo still doesn’t return 0 (or close to it) with this change. Am I doing something wrong:

int getLatencyFromDevice (AudioObjectPropertyScope scope) const
{
    auto bufferSizeProperty = static_cast<UInt32> (bufferSize);
    UInt32 size = sizeof (bufferSizeProperty);
    AudioObjectPropertyAddress pa;
    pa.mElement = juceAudioObjectPropertyElementMain;
    pa.mSelector = kAudioDevicePropertyBufferFrameSize;
    pa.mScope = kAudioObjectPropertyScopeWildcard;
    AudioObjectGetPropertyData (deviceID, &pa, 0, nullptr, &size, &bufferSizeProperty);
    
    UInt32 deviceLatency = 0;
    size = sizeof (deviceLatency);
    pa.mSelector = kAudioDevicePropertyLatency;
    pa.mScope = scope;
    AudioObjectGetPropertyData (deviceID, &pa, 0, nullptr, &size, &deviceLatency);

    UInt32 safetyOffset = 0;
    size = sizeof (safetyOffset);
    pa.mSelector = kAudioDevicePropertySafetyOffset;
    AudioObjectGetPropertyData (deviceID, &pa, 0, nullptr, &size, &safetyOffset);
    
    UInt32 streamLatency = 0;
    size = 0;
    pa.mSelector = kAudioDevicePropertyStreams;
    AudioObjectGetPropertyDataSize (deviceID, &pa, 0, nullptr, &size);
    
    if (size >= sizeof (AudioStreamID))
    {
        HeapBlock<AudioStreamID> streamIDs (size / sizeof (AudioStreamID));
        AudioObjectGetPropertyData (deviceID, &pa, 0, nullptr, &size, streamIDs);
        
        // get the latency of the first stream
        size = sizeof (deviceLatency);
        pa.mSelector = kAudioStreamPropertyLatency;
        AudioObjectGetPropertyData (streamIDs[0], &pa, 0, nullptr, &size, &streamLatency);
    }

    return (int) (deviceLatency + safetyOffset + bufferSizeProperty + streamLatency);
}

That looks right Fabian but see my note at the bottom of the post.

Here’s the code I’ve been using - it looks like it is equivalent:

    int getLatencyFromDevice (AudioObjectPropertyScope scope) const
    {
        UInt32 deviceLatency = 0;
        UInt32 size = sizeof (deviceLatency);
        AudioObjectPropertyAddress pa;
        pa.mElement = juceAudioObjectPropertyElementMain;
        pa.mSelector = kAudioDevicePropertyLatency;
        pa.mScope = scope;
        AudioObjectGetPropertyData (deviceID, &pa, 0, nullptr, &size, &deviceLatency);

        UInt32 safetyOffset = 0;
        size = sizeof (safetyOffset);
        pa.mSelector = kAudioDevicePropertySafetyOffset;
        AudioObjectGetPropertyData (deviceID, &pa, 0, nullptr, &size, &safetyOffset);

        // Query stream latency
        UInt32 streamLatency = 0;
        UInt32 numStreams;
        pa.mSelector = kAudioDevicePropertyStreams;
        if (OK(AudioObjectGetPropertyDataSize (deviceID, &pa, 0, nullptr, &numStreams)))
        {
            HeapBlock<AudioStreamID> streams (numStreams);
            size = sizeof (AudioStreamID*);
            if (numStreams > 0 && OK(AudioObjectGetPropertyData (deviceID, &pa, 0, nullptr, &size, streams)))
            {
                pa.mSelector = kAudioStreamPropertyLatency;
                size = sizeof (streamLatency);
                // We could check all streams for the device, but it only ever seems to return the stream latency on the first stream
                AudioObjectGetPropertyData (streams[0], &pa, 0, nullptr, &size, &streamLatency);
            }
        }
        
        return (int) (deviceLatency + safetyOffset + streamLatency) + getFrameSizeFromDevice();
    }

:warning: In-built microphone and speakers appear to introduce additional latency

In recent years the actual latency through in-built audio devices in macOS won’t match what is reported (at least on the machine I have access to). I suspect there might be some additional latency due to ambient noise reduction. I believe some Macs allow you to disable this in the sound control panel, however mine doesn’t so I can’t test it :frowning:

Importantly, latency reported for external audio devices does match measurements made with RTL Utility. I have tested Steinberg UR22 myself and have confirmation from users of other devices (notably from RME).

1 Like

I haven’t tested this in a while, so things may have changed on the JUCE side or with macOS…but AFAIR when using the in-built devices they are treated by JUCE as separate devices and the AudioIODeviceCombiner class is instanciated. This results in additional buffering and the round-trip latency being inaccurate because it isn’t taken into account. I do recall though that if you aggregate the inbuilt audio I/O via the OS it will be considered to be a single device by JUCE and the round-trip latency was pretty accurate (with similar modifications to JUCE as described above).

I think there is also a way to programatically request macOS to aggregate devices which if implemented may make the AudioIODeviceCombiner redundant in some (all?) scenarios but that is a separate issue I guess.

1 Like

Aha, thanks for pointing that out Tim! I just created an aggregate device and saw that the difference between reported and measured reduced from a couple of thousand samples to several hundred samples. So it helps but doesn’t entirely account for the difference.

Sound Radix repo been using this approach -

Yet, with newer devices we’ve saw Apple’s reporting incorrect results and have reported this to Apple. no update since then.

2 Likes

AFAIR when using the in-built devices they are treated by JUCE as separate devices and the AudioIODeviceCombiner class is instanciated. This results in additional buffering and the round-trip latency being inaccurate because it isn’t taken into account.

Yes, that’s exactly right. I did a bit more investigations. Unfortunately, for separate input/output hardware, the latency of the AudioIODeviceCombiner will not be constant [1]. This is because the AudioIODeviceCombiner does not look at the audio callback timestamps, without which the exact relative start times of the two devices cannot be calculated.

I’ll put up my above change for code review so that the latency is at least correct for when the AudioIODeviceCombiner is not involved and then discuss with the team if, fixing the AudioIODeviceCombiner once and for all, or if programmatically creating an Apple aggregate device is the right approach here.

[1] This can easily be reproduced by measuring the latency of the desired devices with the latency demo, switching to another audio device and back again and then re-measuring the latency. The latency changes each time.

Sounds like a good plan, might as well get it right for users with dedicated audio devices. Thanks Fabian!

I’ll put up my above change for code review so that the latency is at least correct for when the

I compared our changes to yours above and I think the only difference is some additional error checking on our side and perhaps some stylistic stuff.

If this code review is a public process, please point me at it and I’d be happy to give my feedback, otherwise what I think you’ve posted will address the main issue and is a welcome change. Thanks.

Can I double check if the recent changes here are intended to fix the latency reporting?

With those changes I’m still getting very large latencies detected via loopback even when using a dedicated device (i.e. not invoking the AudioIODeviceCombiner).
For example, using my Mackie Onyx Artist 1-2 the reported latency seems to be about twice the buffer size off.

Anybody else seeing the JUCE Latency Tester reporting nothing close to 0ms?

Hi @dave96. The actual fix for the latency reporting is still in code review. Sorry! The change you are referencing is just a code clean-up. But also see this commit which fixed a typo in the commit you referenced.

Ok thanks. Yes, I’ve got the tip, just wanted to link to the larger commit to show what I was talking about.

Is the fix large or is there any chance I could have a patch of it to test locally with the devices I have whilst you review it?

I sent you a direct message.

This has now been fixed on develop with commits a8a03427 (input/output device is the same device) and 98e0ee75 (input/output device is a different device using the same clock signal).

Some notes:

  1. The reported latency/timestamps by the operating system isn’t super accurate. You can expect the DemoRunner’s AudioLatencyDemo to report a corrected latency of <±2ms.
  2. It’s typically higher for USB devices (<±5ms) as USB class audio devices have no way to report their internal latencies to the OS. Some non-pro USB devices I’ve tested had even higher latencies (<±10ms).
  3. Nevertheless, the reported latencies will now be constant and should vary only very little (<±1ms) from test to test and should only depend very little (<±1ms) on the buffer sizes.
  4. If you are using different audio devices for input and output, be sure that they are using the same clock signal. The above fixes will not account for clock drift and you will get randomly varying corrected latencies in the AudioLatencyDemo. Solving this would require resampling the audio.
  5. Last but not least, there still seems to be some issue with Apple MacBook Pro/Air internal mics/speakers (Line In/Out is fine). On my M1, I get a constant corrected latency of 42 ms (independent of sample rate and buffer size). I think this might be an Apple bug and I’m in the midst of filing a bug to Apple to understand why the timestamps seem to be reporting the latency incorrectly. Funnily enough, when creating an aggregate device between the mics and speakers, the latency is correctly accounted for, so some part of the OS must know the correct latency.
6 Likes
1 Like

Hi @fr810,

From commit a8a03427 I see that you use the following formula for device latency:

deviceLatency + safetyOffset + framesInBuffer + streamLatency

Have you confirmed it as correct?

My understanding of how latency is reported is the following:

  • Safety offset is reported by a device driver to account for hardware clock drift/jitter, low-level FIFO buffers, or any other variability – usually small number of samples around 70 or so. This number tends to be an approximation.
  • Device latency is any added latency for the overall device that impacts all streams. Perhaps some DSP or algorithm latency.
  • Stream latency is the same as device latency but specific to a stream to account for algorithms that are run on that stream only. Caveat here is that latency is per stream so something to consider because I honestly don’t know how JUCE deals with multiple streams per device.

These numbers should be reported for both directions. In the end it doesn’t matter where they are reported (which property is used) because it is the job of the client to sum them all up.

  • Buffer size is a bit more interesting because it refers to the I/O cycle buffer size and it impacts both input and output – how long the HAL will buffer the input until the IOProc (app) is called and how much time it is given for the IOProc to process the output thus defining the presentation time of the samples produced (how far in the future output will be written).

Therefore an example loopback app would have a nominal latency of two buffer sizes plus all the device and stream latency for both directions.

That being said, not much can be done if underlying components don’t report accurately. :confused:

1 Like

Hi @fft,

Thank you for your insights. This is super interesting.

Have you confirmed it as correct?

The measured latencies agreed with the calculated latencies for the devices I tested (Motu 8A via USB, Motu 8A via AVB, minidsp MCHStreamer and a cheapo no-brand stereo in/out audio interface from Amazon) within a margin of error (see my post). So it’s hard to say if my latency calculation is incorrect or the error is due to other sources (devices misreporting, USB etc.).

Therefore an example loopback app would have a nominal latency of two buffer sizes plus all the device and stream latency for both directions.

Isn’t this exactly what JUCE does? The loopback demo will sum getOutputLatencyInSamples and getInputLatencyInSamples. Each of these methods get their latencies with the formula you posted:

deviceLatency + safetyOffset + framesInBuffer + streamLatency

i.e. the total latency will be two buffer sizes plus all the device and stream latency for both directions.

Note that the latency is only calculated this way when the output/input is using the same device. If they are separate devices, then we don’t even need to query any of the above latency properties. The AudioIODeviceCombiner uses a FIFO, and the audio callback timestamps delivered to us by the OS, to ensure that a certain target latency is always met. Of course, this target latency needs to be higher than actual latencies + a safety offset to account for the fact that input/output devices will not start at the exact same time.

Isn’t this exactly what JUCE does? The loopback demo will sum getOutputLatencyInSamples and getInputLatencyInSamples . Each of these methods get their latencies with the formula you posted:

If you do that per CoreAudio stream then it is correct, otherwise If JUCE exposes all streams at once then a max() of all the streams latency would be probably the best way to go. That’s one of the challenges of dealing with different implementations of audio system because they all abstract these components differently.

If they are separate devices, then we don’t even need to query any of the above latency properties. The AudioIODeviceCombiner uses a FIFO, and the audio callback timestamps delivered to us by the OS, to ensure that a certain target latency is always met. Of course, this target latency needs to be higher than actual latencies + a safety offset to account for the fact that input/output devices will not start at the exact same time.

That’s right, you can wait for a callback and inspect the input and output AudioTimeStamp for each stream and combine them.

It is worth noting that CoreAudio has the concept of an AggregateAudioDevice which combines streams from different devices (even if they run at different clock domains) and provides the same AudioDevice API for clients, thus the latency properties can be normally queried. The CoreAudio HAL will automatically handle drift correction between audio devices by time tracking and ASRCing with proper ratio while avoiding FIFO to synchronize the samples between different devices because it has random access to the underlying audio buffers. In my view AggregateAudioDevice is one the the great advantages of CoreAudio over any other OS audio server.

1 Like