macOS Round Trip Latency

I would just like to second what andrewj is saying at that we also found that the main issue with latency reporting with coreaudio devices was that the buffer size was not included in the reported latency.

I’m not familiar with what these timestamps you are using represent, so I don’t feel like I can really comment on your approach and whether it is valid ATM, my concerns would be though:

  • The latency doesn’t actually change after configuring and starting the device so why does it need updating?
  • How do clients get notified of the change in latency?
  • Does it include/represent all the latency components, such as systemic device I/O latency?

An example where we found the current latency reporting quite inaccurate was if using the UAD-2 Apollo QUAD, which for input latency coreaudio reports 292 samples for kAudioStreamPropertyLatency, which corresponds to the systemic input latency of the device e.g AD conversion/DSP etc and is constant irrespective of sample rate or buffer size. Output latency reported (kAudioStreamPropertyLatency) for the device is always 22 samples. AFAIR, these values are from memory but they are reasonably close. The kAudioDevicePropertySafetyOffset for the device was ~52 samples for both input and output and it also didn’t vary based on sample rate or buffer size.

So if we assume based on what andrewj and I have observed and the latency should be based on these properties:

  1. kAudioDevicePropertyLatency
  2. kAudioDevicePropertySafetyOffset
  3. kAudioStreamPropertyLatency
  4. kAudioDevicePropertyBufferFrameSize

Then at 64 samples:

Currently reported latency:

  • Input : 292(1) + 52(2) = 344
  • Output: 22(1) + 52(2) = 74
  • Total = 418 samples

Using all of the above latency components:

  • Input: 292(1) + 52(2) + 0(3) + 64(4) = 408
  • Output: 22(1) + 52(2) + 0(3) + 64(4) = 138
  • Total = 546 samples

At low buffer sizes the difference between these two values may not be too bad/noticable, but at 1024 samples:

Currently reported latency

  • Input: 292(1) + 52(2) = 344
  • Output: 22(1) + 52(2) = 74
  • Total = 418 samples

Using all of the above latency components

  • Input: 292(1) + 52(2) + 0(3) + 1024(4) = 1368
  • Output: 22(1) + 52(2) + 0(3) + 1024(4) = 1098
  • Total = 2466 samples

Which is quite a discrepancy.

So in response to:

  1. The actual latency is way higher than the reported latencies from the device.

I think the devices are reporting their latency correctly(well…the devices I tried anyway). Perhaps there is/was a misunderstanding as to what kAudioDevicePropertyLatency represented when the latency calculation was implemented.

  1. The actual latency changes every time a device property changes (e.g. buffer size or sample rate)

Yes it typically should, but as you can see from my example and from what I’ve seen it doesn’t due to the current latency calculation for coreaudio not including all the components of the latency.

1 Like

Just running into this myself. I’ve been patching my JUCE fork based on timbyr’s post and I’ll just mention for anyone reading that kAudioStreamPropertyLatency should be queried against the device’s AudioStream(s) not the device itself. kAudioStreamPropertyLatency has the same value as kAudioDevicePropertyLatency so it will appear to work but you’ll just be getting the device latency again. You can find an impl for getting the correct value from andrewj’s post here:

1 Like

I just reread this part of my post above and realise I meant kAudioDevicePropertyLatency rather than kAudioStreamPropertyLatency in that section. I think I got it correct later in the examples though.

1 Like

I know this is an old topic but it’s not been addressed and I think has some resolutions as mentioned in this thread:

As @fr810 is back around for a bit and is the expert in all things timing, any chance that you could add the check for kAudioDevicePropertyLatency?

I think the idea is that the latency detector demo in the examples dir should return 0 (or very close to it) as the device should be capable of reporting its entire latency (and we shouldn’t need to run round-trip detectors to add any additional latency).

Thanks in advance :pray:

5 Likes

Yes please!

And whilst digging around in there, any such distinguished expert could perhaps consider this too?

So I tried to add up all the latencies together and the latency detector demo still doesn’t return 0 (or close to it) with this change. Am I doing something wrong:

int getLatencyFromDevice (AudioObjectPropertyScope scope) const
{
    auto bufferSizeProperty = static_cast<UInt32> (bufferSize);
    UInt32 size = sizeof (bufferSizeProperty);
    AudioObjectPropertyAddress pa;
    pa.mElement = juceAudioObjectPropertyElementMain;
    pa.mSelector = kAudioDevicePropertyBufferFrameSize;
    pa.mScope = kAudioObjectPropertyScopeWildcard;
    AudioObjectGetPropertyData (deviceID, &pa, 0, nullptr, &size, &bufferSizeProperty);
    
    UInt32 deviceLatency = 0;
    size = sizeof (deviceLatency);
    pa.mSelector = kAudioDevicePropertyLatency;
    pa.mScope = scope;
    AudioObjectGetPropertyData (deviceID, &pa, 0, nullptr, &size, &deviceLatency);

    UInt32 safetyOffset = 0;
    size = sizeof (safetyOffset);
    pa.mSelector = kAudioDevicePropertySafetyOffset;
    AudioObjectGetPropertyData (deviceID, &pa, 0, nullptr, &size, &safetyOffset);
    
    UInt32 streamLatency = 0;
    size = 0;
    pa.mSelector = kAudioDevicePropertyStreams;
    AudioObjectGetPropertyDataSize (deviceID, &pa, 0, nullptr, &size);
    
    if (size >= sizeof (AudioStreamID))
    {
        HeapBlock<AudioStreamID> streamIDs (size / sizeof (AudioStreamID));
        AudioObjectGetPropertyData (deviceID, &pa, 0, nullptr, &size, streamIDs);
        
        // get the latency of the first stream
        size = sizeof (deviceLatency);
        pa.mSelector = kAudioStreamPropertyLatency;
        AudioObjectGetPropertyData (streamIDs[0], &pa, 0, nullptr, &size, &streamLatency);
    }

    return (int) (deviceLatency + safetyOffset + bufferSizeProperty + streamLatency);
}

That looks right Fabian but see my note at the bottom of the post.

Here’s the code I’ve been using - it looks like it is equivalent:

    int getLatencyFromDevice (AudioObjectPropertyScope scope) const
    {
        UInt32 deviceLatency = 0;
        UInt32 size = sizeof (deviceLatency);
        AudioObjectPropertyAddress pa;
        pa.mElement = juceAudioObjectPropertyElementMain;
        pa.mSelector = kAudioDevicePropertyLatency;
        pa.mScope = scope;
        AudioObjectGetPropertyData (deviceID, &pa, 0, nullptr, &size, &deviceLatency);

        UInt32 safetyOffset = 0;
        size = sizeof (safetyOffset);
        pa.mSelector = kAudioDevicePropertySafetyOffset;
        AudioObjectGetPropertyData (deviceID, &pa, 0, nullptr, &size, &safetyOffset);

        // Query stream latency
        UInt32 streamLatency = 0;
        UInt32 numStreams;
        pa.mSelector = kAudioDevicePropertyStreams;
        if (OK(AudioObjectGetPropertyDataSize (deviceID, &pa, 0, nullptr, &numStreams)))
        {
            HeapBlock<AudioStreamID> streams (numStreams);
            size = sizeof (AudioStreamID*);
            if (numStreams > 0 && OK(AudioObjectGetPropertyData (deviceID, &pa, 0, nullptr, &size, streams)))
            {
                pa.mSelector = kAudioStreamPropertyLatency;
                size = sizeof (streamLatency);
                // We could check all streams for the device, but it only ever seems to return the stream latency on the first stream
                AudioObjectGetPropertyData (streams[0], &pa, 0, nullptr, &size, &streamLatency);
            }
        }
        
        return (int) (deviceLatency + safetyOffset + streamLatency) + getFrameSizeFromDevice();
    }

:warning: In-built microphone and speakers appear to introduce additional latency

In recent years the actual latency through in-built audio devices in macOS won’t match what is reported (at least on the machine I have access to). I suspect there might be some additional latency due to ambient noise reduction. I believe some Macs allow you to disable this in the sound control panel, however mine doesn’t so I can’t test it :frowning:

Importantly, latency reported for external audio devices does match measurements made with RTL Utility. I have tested Steinberg UR22 myself and have confirmation from users of other devices (notably from RME).

1 Like

I haven’t tested this in a while, so things may have changed on the JUCE side or with macOS…but AFAIR when using the in-built devices they are treated by JUCE as separate devices and the AudioIODeviceCombiner class is instanciated. This results in additional buffering and the round-trip latency being inaccurate because it isn’t taken into account. I do recall though that if you aggregate the inbuilt audio I/O via the OS it will be considered to be a single device by JUCE and the round-trip latency was pretty accurate (with similar modifications to JUCE as described above).

I think there is also a way to programatically request macOS to aggregate devices which if implemented may make the AudioIODeviceCombiner redundant in some (all?) scenarios but that is a separate issue I guess.

1 Like

Aha, thanks for pointing that out Tim! I just created an aggregate device and saw that the difference between reported and measured reduced from a couple of thousand samples to several hundred samples. So it helps but doesn’t entirely account for the difference.

Sound Radix repo been using this approach -

Yet, with newer devices we’ve saw Apple’s reporting incorrect results and have reported this to Apple. no update since then.

2 Likes

AFAIR when using the in-built devices they are treated by JUCE as separate devices and the AudioIODeviceCombiner class is instanciated. This results in additional buffering and the round-trip latency being inaccurate because it isn’t taken into account.

Yes, that’s exactly right. I did a bit more investigations. Unfortunately, for separate input/output hardware, the latency of the AudioIODeviceCombiner will not be constant [1]. This is because the AudioIODeviceCombiner does not look at the audio callback timestamps, without which the exact relative start times of the two devices cannot be calculated.

I’ll put up my above change for code review so that the latency is at least correct for when the AudioIODeviceCombiner is not involved and then discuss with the team if, fixing the AudioIODeviceCombiner once and for all, or if programmatically creating an Apple aggregate device is the right approach here.

[1] This can easily be reproduced by measuring the latency of the desired devices with the latency demo, switching to another audio device and back again and then re-measuring the latency. The latency changes each time.

Sounds like a good plan, might as well get it right for users with dedicated audio devices. Thanks Fabian!

I’ll put up my above change for code review so that the latency is at least correct for when the

I compared our changes to yours above and I think the only difference is some additional error checking on our side and perhaps some stylistic stuff.

If this code review is a public process, please point me at it and I’d be happy to give my feedback, otherwise what I think you’ve posted will address the main issue and is a welcome change. Thanks.

Can I double check if the recent changes here are intended to fix the latency reporting?

With those changes I’m still getting very large latencies detected via loopback even when using a dedicated device (i.e. not invoking the AudioIODeviceCombiner).
For example, using my Mackie Onyx Artist 1-2 the reported latency seems to be about twice the buffer size off.

Anybody else seeing the JUCE Latency Tester reporting nothing close to 0ms?

Hi @dave96. The actual fix for the latency reporting is still in code review. Sorry! The change you are referencing is just a code clean-up. But also see this commit which fixed a typo in the commit you referenced.

Ok thanks. Yes, I’ve got the tip, just wanted to link to the larger commit to show what I was talking about.

Is the fix large or is there any chance I could have a patch of it to test locally with the devices I have whilst you review it?

I sent you a direct message.

This has now been fixed on develop with commits a8a03427 (input/output device is the same device) and 98e0ee75 (input/output device is a different device using the same clock signal).

Some notes:

  1. The reported latency/timestamps by the operating system isn’t super accurate. You can expect the DemoRunner’s AudioLatencyDemo to report a corrected latency of <±2ms.
  2. It’s typically higher for USB devices (<±5ms) as USB class audio devices have no way to report their internal latencies to the OS. Some non-pro USB devices I’ve tested had even higher latencies (<±10ms).
  3. Nevertheless, the reported latencies will now be constant and should vary only very little (<±1ms) from test to test and should only depend very little (<±1ms) on the buffer sizes.
  4. If you are using different audio devices for input and output, be sure that they are using the same clock signal. The above fixes will not account for clock drift and you will get randomly varying corrected latencies in the AudioLatencyDemo. Solving this would require resampling the audio.
  5. Last but not least, there still seems to be some issue with Apple MacBook Pro/Air internal mics/speakers (Line In/Out is fine). On my M1, I get a constant corrected latency of 42 ms (independent of sample rate and buffer size). I think this might be an Apple bug and I’m in the midst of filing a bug to Apple to understand why the timestamps seem to be reporting the latency incorrectly. Funnily enough, when creating an aggregate device between the mics and speakers, the latency is correctly accounted for, so some part of the OS must know the correct latency.
6 Likes
1 Like

Hi @fr810,

From commit a8a03427 I see that you use the following formula for device latency:

deviceLatency + safetyOffset + framesInBuffer + streamLatency

Have you confirmed it as correct?

My understanding of how latency is reported is the following:

  • Safety offset is reported by a device driver to account for hardware clock drift/jitter, low-level FIFO buffers, or any other variability – usually small number of samples around 70 or so. This number tends to be an approximation.
  • Device latency is any added latency for the overall device that impacts all streams. Perhaps some DSP or algorithm latency.
  • Stream latency is the same as device latency but specific to a stream to account for algorithms that are run on that stream only. Caveat here is that latency is per stream so something to consider because I honestly don’t know how JUCE deals with multiple streams per device.

These numbers should be reported for both directions. In the end it doesn’t matter where they are reported (which property is used) because it is the job of the client to sum them all up.

  • Buffer size is a bit more interesting because it refers to the I/O cycle buffer size and it impacts both input and output – how long the HAL will buffer the input until the IOProc (app) is called and how much time it is given for the IOProc to process the output thus defining the presentation time of the samples produced (how far in the future output will be written).

Therefore an example loopback app would have a nominal latency of two buffer sizes plus all the device and stream latency for both directions.

That being said, not much can be done if underlying components don’t report accurately. :confused:

1 Like