CoreAudioReader - long delay before streaming mp3

Note: This is on iOS - I haven’t tried OS X.

In CoreAudioReader’s constructor, this call:

ExtAudioFileGetProperty ( … kExtAudioFileProperty_FileLengthFrames …);

… blocks the calling thread for around 5 seconds or so when opening certain mp3 files (of ~5min duration). The time it uses seems to be proportional to the length of the mp3 file, so evidently it’s scanning and doing a fair amount of work to obtain the value of this property.

Other mp3 files of similar length don’t seem to cause a noticeable delay. So far with the examples I’ve found, mp3 files with VBR are quick, and those with a fixed bit rate are slow. I haven’t tested enough files to be sure this is THE rule though.

Obviously this is a major problem, particularly when trying to stream audio in a DAW track. Has anyone any experience/workarounds? Obviously I can remove the call or try replacing it with a value obtained with kAudioFilePropertyEstimatedDuration, but I suspect Juce is reasonably dependent upon having an accurate frame count up front.

I’ve made some considerable progress on this for those interested (which appears to be no one!).

For a recap, after more testing I’ve narrowed the problem down to:

  • CoreAudioReader calls ExtAudioFileGetProperty(… kExtAudioFileProperty_FileLengthFrames …) upon construction to obtain the file’s exact length in frames.
  • For CBR (constant bit rate) mp3’s only, this function call will scan the entire file.
  • VBR mp3’s and m4a’s are not scanned and take a small constant time (< 4us, presumably this information is available in the header for these types).
  • For an mp3 of ~6mins, this function call takes almost 10 seconds on an iPhone 5.

Since Juce supplies CoreAudio with its own read callbacks, it was easy to log the access pattern:

getSizeCallback() [call 1]
getSizeCallback() [call 2]
readCallback(4608, 4) [call 1]
getSizeCallback() [call 3]
getSizeCallback() [call 4]
readCallback(5235, 1027) [call 2]
getSizeCallback() [call 5]
readCallback(5235, 4) [call 3]
getSizeCallback() [call 6]
getSizeCallback() [call 7]
readCallback(5862, 1027) [call 4]
getSizeCallback() [call 8]
readCallback(5862, 4) [call 5]
...

getSizeCallback() gets called a lot. I notice that this goes to the file system to obtain the file size each time, but it’s reasonable to suppose this will be constant. By reading the file size once and then returning the cached value in getSizeCallback() I was able to halve the time from ~10s to ~5s. I’ll make a disclaimer here which is that my (modified) version of Juce does a bit more work than the vanilla version (which I won’t go into), but I think this change would still make a substantial difference to the unmodified version.

Another significant problem is that readCallback() is frequently called with a requested size of 1027 bytes on 627 byte intervals which effectively means it’s reading the entire file’s contents about 1.4 times. This seems like a bug in Apple’s code to me, as also interleaved with those reads is a small 4 byte chunk which I suspect is the mp3 frame header containing all the info it needs for this operation.

By hacking a BufferedInputStream into AudioFormatManager::createReaderFor(), the scan time was reduced from ~5s to ~200ms on first access and ~80ms on second access (OS file cache warmed?) on an iPhone 5. A 32K buffer seemed a reasonable sweet spot.

What I don’t like about this hack is that the audio reader will continue to use this buffered stream during ‘normal’ use (ie. reading the samples) which may be inefficient for the app’s intended usage. I think it’d be better for
CoreAudioReader to wrap the stream in a temporary buffered stream just for the duration of the ‘get number of frames’ call.

This is all great and everything, but burning 70ms of heavy file access before opening the stream is still not much cop for real-time audio multitracking. I could read this value on first access and then cache it in a dated file alongside the mp3 but it would be much better to be able to obtain this value in constant time.

MP3’s are a black box to me, so before I dive in, does anyone know if it’s possible to quickly calculate the number of audio frames in a CBR mp3 just from its header? Surely this must be a lot easier than the VBR case …