CoreAudioReader - long delay before streaming mp3

I’ve made some considerable progress on this for those interested (which appears to be no one!).

For a recap, after more testing I’ve narrowed the problem down to:

  • CoreAudioReader calls ExtAudioFileGetProperty(… kExtAudioFileProperty_FileLengthFrames …) upon construction to obtain the file’s exact length in frames.
  • For CBR (constant bit rate) mp3’s only, this function call will scan the entire file.
  • VBR mp3’s and m4a’s are not scanned and take a small constant time (< 4us, presumably this information is available in the header for these types).
  • For an mp3 of ~6mins, this function call takes almost 10 seconds on an iPhone 5.

Since Juce supplies CoreAudio with its own read callbacks, it was easy to log the access pattern:

getSizeCallback() [call 1]
getSizeCallback() [call 2]
readCallback(4608, 4) [call 1]
getSizeCallback() [call 3]
getSizeCallback() [call 4]
readCallback(5235, 1027) [call 2]
getSizeCallback() [call 5]
readCallback(5235, 4) [call 3]
getSizeCallback() [call 6]
getSizeCallback() [call 7]
readCallback(5862, 1027) [call 4]
getSizeCallback() [call 8]
readCallback(5862, 4) [call 5]
...

getSizeCallback() gets called a lot. I notice that this goes to the file system to obtain the file size each time, but it’s reasonable to suppose this will be constant. By reading the file size once and then returning the cached value in getSizeCallback() I was able to halve the time from ~10s to ~5s. I’ll make a disclaimer here which is that my (modified) version of Juce does a bit more work than the vanilla version (which I won’t go into), but I think this change would still make a substantial difference to the unmodified version.

Another significant problem is that readCallback() is frequently called with a requested size of 1027 bytes on 627 byte intervals which effectively means it’s reading the entire file’s contents about 1.4 times. This seems like a bug in Apple’s code to me, as also interleaved with those reads is a small 4 byte chunk which I suspect is the mp3 frame header containing all the info it needs for this operation.

By hacking a BufferedInputStream into AudioFormatManager::createReaderFor(), the scan time was reduced from ~5s to ~200ms on first access and ~80ms on second access (OS file cache warmed?) on an iPhone 5. A 32K buffer seemed a reasonable sweet spot.

What I don’t like about this hack is that the audio reader will continue to use this buffered stream during ‘normal’ use (ie. reading the samples) which may be inefficient for the app’s intended usage. I think it’d be better for
CoreAudioReader to wrap the stream in a temporary buffered stream just for the duration of the ‘get number of frames’ call.

This is all great and everything, but burning 70ms of heavy file access before opening the stream is still not much cop for real-time audio multitracking. I could read this value on first access and then cache it in a dated file alongside the mp3 but it would be much better to be able to obtain this value in constant time.

MP3’s are a black box to me, so before I dive in, does anyone know if it’s possible to quickly calculate the number of audio frames in a CBR mp3 just from its header? Surely this must be a lot easier than the VBR case …