Dealing with MP3 Decoders inserting "silence"


#1

It seems that MP3 decoders have a habit inserting silence at the beginning and end of audio.

AFAICT on OS X the JUCE MP3Reader inserts a constant amount of samples at the beginning of decoded audio (around 1100?). The CoreAudioReader MP3 decoder seems to insert a variable amount of samples at the beginning. This seems to be approximately inversely proportional to the encoded audio duration, with short files having more silence inserted than longer ones. I have around 500 samples inserted for an 8000 sample file and almost zero for a 2M sample file.

So some questions:

  • For the MP3Reader class, is the amount of lead-in added, guaranteed to be a fixed number of samples…?
  • For the CoreAudioReader class does anyone know a reliable way to calculate the lead-in given the source file length?

Same for the samples added at the end…


#2

I’ve had to do this before, but without any knowledge of the codec. You can find the sample delay by computing the sample index of the maximum of the cross correlation of the original and decoded signal. Here’s a thread on computing cross-correlation with an FFT. Just find the sample index of the maximum, and you’re good to go. I’ve found that 2048 or 4096 sized FFT is sufficient for most cases.

For finding the samples at the end, do the same thing, but reverse the signal in time. If you only care about the start/end times, I’d just take an FFT at the front/end and call it a day.


#3

Hi @Holy_City, thanks for this.

If I understand correctly, in order to use this method I need the “original signal” e.g. an uncompressed audio buffer from which the MP3 was encoded in the first place?

If so, do I just need the first N samples of the original signal, where N is greater than the number samples silence inserted by the decoder?


#4

Yes, you need the original signal. And for the FFT size, I’ve found in the past that you should use at least 2x the expected delay amount, and in some cases you need much larger buffers. Like I said, try 2048-4096 sized FFTs and see where that gets you.