[Solved] Plug FFMpeg Audio Stream into Juce


#1

Hi Guys,
I’m trying to build a kind of Video Player with Juce, just id doesn’t render the Video to a screen but to a Building Facade, but that a different topic.

I’m reading the Video files with FFMpeg and i want to pass the audio stream data to the AudioIODevice.
So i basically created a class called AudioManager which inherits from AudioIODeviceCallback & contains an AudioDeviceManager, AudioSourcePlayer and some *Source classes.
I didn’t have problems to play back audio files at all.

So i thought when i want to pass the FFMpeg audio samples to juce i simply need to do this in my audioDeviceIOCallback function.

So the audioDeviceIOCallback from the AudioManager basically does the following: In fact it delegates to this one:

void VideoManagerAudioDecoder::audioDeviceIOCallback(const float** inputChannelData, int numInputChannels, 
                                                     float** outputChannelData, int numOutputChannels, 
                                                     int numSamples)
{
    int bytesRequired = numSamples * m_channelsIn * m_bytesPerSampleIn;
    if (decode(bytesRequired))
    {
        int16_t* samples16Ptr = (int16_t*)m_audioBuffer;
    
        for (int i = 0; i < numSamples; ++i)
        {
            for (int j = 0; j < numOutputChannels; ++j)
            {
                outputChannelData[j][i] = (int16_t)*samples16Ptr++;
            }
        }
    }
}

These are the instance variables that you might want to know their type:

        int m_audioBufferSize;            ///< The size of the samples buffer.
        boost::uint8_t* m_audioBuffer;    ///< The buffer the samples are fetched into.
        boost::uint8_t* m_audioBufferPtr; ///< The pointer that points to the beginning of the sample block.

        AVPacket* m_lastPacket;    ///< The last packet we got from the queue.
        AVPacket* m_currentPacket; ///< The packet fed to the avcodec_decode_audio3.
        int m_bytesLeft;           ///< The number of bytes read, but not written.

So give you some more information.

I know that the sampling rate of the audio device equals the sampling rate of the audio stream from the video.
I know that the audio stream samples are stored as int16 (although i have to read the buffer as int8).
I know that the juce’s AudioIODevice.getCurrentBitDepth returns 2. So both streams have the same bit depth.
I know that the number of input channels is 2, and that the number of output channels in the AudioIODevice is 2 as well.
I know that audio samples in the input buffer are stored alternating between channels. So when i decode the audio stream i get back a uint8_t strukture which has the following layout: left left right right left left …
I know that the problem is not cause by wrong endian in the audio stream.

So what is it what i experience: I can actually play back the audio, but it’s overloaded with lots of noise. So i can hear that it plays back at approx. the right speed (except i change the Sampling rate ofc) but doesn’t sound good at all.

[color=#BF0000]So i’ve played around a bit with how i assign the samples to the outputChannelData but the code above is the best solution so far, but with the aforementioned noise.
Is this a problem with casting from int16_t to float (which has 4Bytes on my system)? How shall i copy the data to the outputChannelData? [/color]

For those who are interested in the decoding of the FFMpeg Packets:

bool VideoManagerAudioDecoder::decode(int bytesRequired)
{
    if (m_videoState->fetched)
    {
        ScopedLock lock(m_videoState->access);
        if (m_videoState->audioPacketQueue.empty())
        {
            return false;
        }
    }

    // Reset pointers & others
    uint8_t* bufferPtr = m_audioBuffer;
    int bytesInBuffer  = m_audioBufferSize;
    bool empty = false;

    // I assume that for multichannel sound the samples for each channel are alternating.
    // This may be totally wrong, but FFMpeg's documentation sucks pretty hard as we all know.
    while (bytesRequired > 0 && !empty)
    {
        // Than we fully fetched it, but didn't use all of it yet. So let's use it now.
        if (m_bytesLeft > 0)
        {
            // copy rest to front.
            memcpy(bufferPtr, m_audioBufferPtr, m_bytesLeft);
            // reset pointers
            m_audioBufferPtr = m_audioBuffer + m_bytesLeft;
            bufferPtr = m_audioBufferPtr;
            // update bytes required
            bytesRequired -= m_bytesLeft;

            // seems like we can't use the whole buffer this time either.
            if (bytesRequired <= 0)
            {
                m_bytesLeft = abs(bytesRequired);  // get unused bytes.
                m_audioBufferPtr += bytesRequired; // set to last used position.
                continue;
            }
        }

        // If we have no packet atm, fetch a new one.
        if (m_lastPacket == 0) 
        {
            ScopedLock lock(m_videoState->access);

            if (m_videoState->audioPacketQueue.empty()) // we don't have time for this here.
            {
                return false;
            }

            m_lastPacket = m_videoState->audioPacketQueue.front();
            m_videoState->audioPacketQSize -= m_lastPacket->size;
            m_videoState->audioPacketQueue.pop();

            // Create a shallow copy of the last package.
            memcpy(m_currentPacket, m_lastPacket, sizeof(AVPacket));

            empty = m_videoState->fetched && m_videoState->audioPacketQueue.empty();
        }

        int bytesRead, bytesWritten = bytesInBuffer;
        if ( (bytesRead = avcodec_decode_audio3(m_videoState->audioCodecHeader, (boost::int16_t*)bufferPtr, &bytesWritten, m_currentPacket)) >= 0 && bytesWritten > 0)
        {
            m_currentPacket->data += bytesRead;
            m_currentPacket->size -= bytesRead;

            m_audioBufferPtr += bytesWritten;
            bufferPtr += bytesWritten;
            bytesRequired -= bytesWritten;
            bytesInBuffer -= bytesWritten;

            // If we read more bytes than we actually need, remember the beginning of the unused buffer.
            m_bytesLeft = 0;
            if (bytesRequired < 0)
            {
                m_bytesLeft = abs(bytesRequired);  // get unused bytes.
                m_audioBufferPtr += bytesRequired; // set to last used position.
            }

            // If the whole packet was read, just delete it.
            if (m_currentPacket->size == 0)
            {
                av_free_packet(m_lastPacket);
                deleteAndZero(m_lastPacket);
            }
        }
        else 
        {
            return false;
        }
    }
    return true;
}

I’ve already written the decoded samples into a file, and used the file in an SDL applicaton where i just fed the samples via the audio_callback into SDL, and it worked fine, so it’s NOT a problem with the decoding!

BTW: The code above is ofc not ready to go, since the code assumes that the sampling rate & bit depth of the audio stream equals the audio device, which is the case in my test environment, but not necessarily always.
Further i don’t sync audio with video atm.

Thanks for your Help (hopefully)

LG,
Myke.


#2

Hi guys again,
I think i have to streamline my question a bit:

I have now an example file where the audio stream is NOT encoded, has 2 channels and comes as 2 Byte signed integer.
So instead of having to decode the packets i can just use the data in the AVPacket struct directly. I’ve tried to write the data into a file and read it in a SDL application and provide it there via the audio callback. That worked fine!

So what else do i know:
The AudioIODevice i get from AudioDeviceManager::getCurrentAudioDevice() has 2 channels (as does the audio stream, getActiveOutputChannels().countNumberOfSetBits()) and 2Byte bit depth (as does the audio stream, getCurrentBitDepth() >> 3)
The sampling rate of the audio stream and the AudioIODevice are the same as well.
And again: The data in the audio stream is stored like this: L L R R L L R R L L R R (or the other way round, doesn’t make much difference here.)

Currently i assign the audio stream to the outputChannelData like this.
But the resulting output lets you recognize the audio stream, but with lots of noise added to it.

        int16_t* samples16Ptr = (int16_t*)m_audioBuffer; // audioBuffer is a uint8_t* array that contains the samples
    
        for (int i = 0; i < numSamples; ++i)
        {
            for (int j = 0; j < numOutputChannels; ++j)
            {
                outputChannelData[j][i] = *samples16Ptr++;
            }
        }

So now that i know that i have the sample data in a uint8_t* array, how would i pass that to juce in the audioIODeviceCallback?

BTW: We are not using the most recent version of juce at our company, but switching to the newest one might cause a lot of troubles with old software… We use v1.46 and we develop on Windows.


#3

I don’t understand - you don’t seem to be converting your int16 values to floating point…?


#4

Well, thats because i thought the samples don’t need to be float. The juce AudioIODevice tells me that it has BitDepth 16, so the audio device definitely doesn’t use float values, which are 4Byte.

So i thought there is no need to convert the int16 into float because the audio device interprets it as int16 anyway. But how you ask the question i assume i have to convert my int16 into float, and juce converts it back into 2Byte whatever internally.

I’ll check what comes out when i use the audio_resample functionality of FFMpeg to convert the int16 into float values …


#5

The samples are always floats - that’s why the array is a float**! Looks like you’re just letting the compiler implicitly convert them, which will be way out of range.


#6

Alright, thanks so far.

Hard coded Solution:

[code] int16_t* samples16Ptr = (int16_t*)m_audioBuffer;

    for (int i = 0; i < numSamples; ++i)
    {
        for (int j = 0; j < numOutputChannels; ++j)
        {
            outputChannelData[j][i] = (*samples16Ptr++) * (1.0f / (1 << 15));
        }
    }

[/code]

So i don’t have to care about AudioIODevice::getCurrentBitDepth() in the Callback, and juce internally converts the float back to whatever is needed by the device?


#7

Yeah, it’s just provided for information, you don’t need to worry about actually converting anything yourself.