I’ve learnt a bit more about mp3 files than I wanted to and here’s what I cooked up. Sorry, the code’s not quite Juce but should be fairly trivial to adapt and incorporate if you wish. For CBR files, it now comes up with the same answer as the CoreAudio reader, but does it in 5ms rather than the 10s I started with (https://forum.juce.com/t/coreaudioreader-long-delay-before-streaming-mp3/18358/2)
A quick explanation is that mp3 files contain chunks called ‘frames’. Each mp3 frame begins with a 4 byte header followed by a few hundred audio frames. An mp3 frame always represents a fixed period of time, so the actual number of audio frames within it depends upon the sample rate. CBR files always have the same sample rate, but VBR files will most likely have a different sample rate for each mp3 frame. It’s sometimes necessary to scan VBR file entirely and examine every header to get an accurate count, but if the encoder has been a good citizen then it should have added an extra ‘xing’ header with a TOC to make this unnecessary. CBR’s should be simpler but frames sometimes contain an extra padding slot to align sample rate multiples. Again, this means there’s no choice but to scan the whole file for a completely accurate count.
Note the implementation below doesn’t use the TOC for VBR files, as I’m intending on using CoreAudio for those.
Added to MP3Decoder::MP3Stream:
// Calculates the total number of sample frames in the entire MP3 file. Each MP3 frame within the file may (or may not) contain a padding 'slot', so
// the only way to obtain a fully accurate count is to scan the entire file. However, it is possible to only partically scan the first part of the
// file and then infer the rest based upon the ratio of MP3 frames discovered with padding vs. frames without. This method will be more accurate for
// CBR (constant bit rate) files rather than VBR.
// Note: This function may change the current stream position.
// zAccuracy - If 0.0f, only the minimum number of MP3 frames will be scanned. If 1.0f, the entire file is always scanned.
s64 CalculateNumSampleFrames(f32 zAccuracy = 0.5f)
// Initial plausibility check on stream size - we'll need at least one mp3 frame
const s64 StreamSize = stream.getTotalLength();
if (StreamSize < 64)
// zAccuracy determines the fraction of the stream we physically scan, and then we'll infer the rest based upon the statistics obtained
const s64 ScanEndPos = (s64)(StreamSize * zAccuracy);
// Always scan a minimum number of mp3 frames so we can obtain reasonable statistics - this is still fairly quick and means we're more likely to be accurate for small files.
const s32 MinFramesToScan = 64;
// Locate the first frame (from the current stream position)
u32 FrameHeader = seekNextFrame(nullptr);
if (FrameHeader == 0)
// Location of the first valid frame (any data before this shouldn't be included in the calculation)
const s64 FirstFrameStreamPos = stream.getPosition();
s32 NumFrames = 0;
// For each mp3 frame
// Decode the header so we can obtain the frame's data size
// Add to our frame count
// Skip past the frame's audio data
stream.setPosition(stream.getPosition() + Frame.frameSize);
// Search for the next frame matching the properties of the current frame
FrameHeader = seekNextFrame(&Frame);
while ((FrameHeader != 0) && ((NumFrames < MinFramesToScan) || (stream.getPosition() < ScanEndPos)));
// Found some valid frames?
if (NumFrames <= 0)
// Did we stop before we got to the end of the stream?
s64 StreamPos = stream.getPosition();
s64 NumBytesRemaining = StreamSize - StreamPos;
if (NumBytesRemaining > 0)
// Estimate the number of frames remaining based upon the frames we scanned
f32 MeanFramesPerByte = (f32)NumFrames / (StreamPos - FirstFrameStreamPos);
s32 EstimatedRemainingFrames = RoundToNearestInt(NumBytesRemaining * MeanFramesPerByte);
// Add the estimated part
NumFrames += EstimatedRemainingFrames;
return NumFrames * 1152;
// Helper for CalculateNumSampleFrames(). Based upon scanForNextFrameHeader(), but doesn't change any state.
// Seeks forward from the current stream position to find the start of the next mp3 header.
// If zpLastFrame is nullptr, the function returns the first header found. Otherwise, it will return
// the first header found which matches the properties of the specified header.
// Returns a frame header value, or zero if the next frame could not be found. Valid frame headers are always non-zero.
u32 seekNextFrame(const MP3Frame* zpLastFrame) noexcept
const s64 streamStartPos = stream.getPosition();
int offset = -3;
u32 header = 0;
if (stream.isExhausted() || stream.getPosition() > streamStartPos + 32768)
header = (header << 8) | (u8) stream.readByte();
if (offset >= 0 && isValidHeader (header, (zpLastFrame == nullptr) ? -1 : zpLastFrame->layer))
if (zpLastFrame == nullptr)
const bool mpeg25 = (header & (1 << 20)) == 0;
const u32 lsf = mpeg25 ? 1 : ((header & (1 << 19)) ? 0 : 1);
const u32 sampleRateIndex = mpeg25 ? (6 + ((header >> 10) & 3)) : (((header >> 10) & 3) + (lsf * 3));
const u32 mode = (header >> 6) & 3;
const u32 numChannels = (mode == 3) ? 1 : 2;
if (numChannels == (u32)zpLastFrame->numChannels &&
lsf == (u32)zpLastFrame->lsf &&
mpeg25 == zpLastFrame->mpeg25 &&
sampleRateIndex == (u32)zpLastFrame->sampleRateIndex)
… used in MP3Reader constructor like this:
// Calculate the number of frames
const s64 streamPos = stream.stream.getPosition();
s64 NumSampleFrames = stream.CalculateNumSampleFrames(0.0f);
// Restore the original stream position
if ((NumSampleFrames > 0) && readNextBlock())
mFormat = ABF_F32;
mSampleRate = stream.frame.getFrequency();
mNumChannels = (unsigned int) stream.frame.numChannels;