Compressing/Decompressing an Audio Stream

Hi, I’ve created an app to stream audio from one computer to another and it works pretty well with the uncompressed audio stream presented by audioDeviceIOCallback. Now I want to compress the stream to hopefully reduce network load. I have created a pair of functions to hopefully do this using the GZip streams, however I keep getting crashes in memory allocations during calls to GZIPDecompressorInputStream::ensureSize().

Am I going about this the right way or is there a simpler method? I know the decompression is pretty inelegant but I couldn’t think of a better way to decompress a stream if the original size isn’t known. Any pointers would be much appreciated.

[code]void compressMemoryBlock(const MemoryBlock &sourceBlock, MemoryBlock &destinationBlock)
{
MemoryOutputStream memoryStream(destinationBlock, true);
GZIPCompressorOutputStream zipStream(&memoryStream);

zipStream.write(sourceBlock.getData(), sourceBlock.getSize());
zipStream.flush();

DBG("Original: "<<(int)sourceBlock.getSize()<<" - Comp: "<<(int)destinationBlock.getSize()<<" - Ratio: "<<(destinationBlock.getSize()/(float)sourceBlock.getSize()));

}[/code]

[code]void decompressMemory(const void *sourceBlock, size_t sourceBlockSize, MemoryBlock &destinationBlock, const int bufferSize =8192)
{
MemoryInputStream memoryStream(sourceBlock, sourceBlockSize, false);
GZIPDecompressorInputStream zipStream(memoryStream);

destinationBlock.ensureSize(bufferSize, false);
void* currentPos = destinationBlock.getData();
int64 totalSize = 0;
while (! zipStream.isExhausted())
{
	int64 bytesRead = zipStream.read(currentPos, bufferSize);
	totalSize += bytesRead;
	currentPos = addBytesToPointer(currentPos, bytesRead);
	destinationBlock.ensureSize(destinationBlock.getSize()+bufferSize, false);
}
destinationBlock.setSize(totalSize, false);

DBG("Comp: "<<(int)sourceBlockSize<<" - Original: "<<(int)destinationBlock.getSize()<<" - Ratio: "<<(destinationBlock.getSize()/(float)sourceBlockSize)<<"\n");

}[/code]

Surely you should be using an audio codec that’s actually designed for streaming data?

…but if you think the crash may be in my code, and can give me a code snippet that reproduces it in a repeatable way, I’ll look into it.

crashes in allocations

Details, please!

“Crashes in allocation” almost certainly means “the program went nuts and ate all your memory”. This almost certainly means an infinite loop in your code somewhere - the fact that it crashes in JUCE almost certainly means only that the memory allocation happens to be done in JUCE.

Let me tell you, I’ve done this more than once. :smiley:

[quote]Surely you should be using an audio codec that’s actually designed for streaming data?[/quote]Probably. This just seemed the easiest method to begin with.

Yeh I don’t think this is actually a juce problem, the crash happens within the call to realloc within the HeapBlock contained within the destination MemoryBlock. I don’t know all the details of how memory allocation works but I obviously if this fails the program is definitely going to crash.

I’m currently working on a method to embed the original size within the data packet so the whole size of the decompressed stream can be allocated in one go. This should be much safer and faster tan continuously reallocating a memory block. I will let you know how it goes.

Sending the uncompressed size and then using this to initialise the stream seems to work perfectly.

I did find out that I was referencing a MemoryBlock that had gone out of scope that was probably causing the crashes before. However if I revert to using the method I posted first I get crashes in GZip::inflate. This is irrelevant really as I have a working method.

Thanks for all the pointers.

Since you’re doing non-lossy compression of audio data yourself, let me suggesting compressing the first derivative of each audio channel instead of the audio channel itself, the result should better.

Sorry I’m not sure I understand. Isn’t the first derivative for a sampled signal such as this the equation of the tangent between two points? This leads to finding the change in sample value dy and the change in time, dx (always 1 sample). This means the first derivative of the signal would simply be dy. Wouldn’t this just lead to doing n more subtraction operations each time a packet is sent? Sorry if i’m missing something here, haven’t touched any differentiation for 6 years.

By “the result should better” do you mean this derivative will compress better than the original signal?

At the moment using the lowest compression setting for GZip I get about a 0.9 compression ratio when audio is streaming. I know this isn’t much but even with the highest compression setting this only improves to about 0.8-0.85. The main advantage to this method at the moment is that when there is no audio playing the compressed number of bytes reduces to about 30 or a ratio of 0.003. This is particularly useful for an ‘always on’ service like this because it means resources are minimal when it’s not actually in use.

Yes.

I was referring to the discrete derivative. The difference between subsequent samples in the same channel. You would have to transmit the starting value for the differences to make sense. Or possibly, the starting value per-packet (to handle loss).

Interesting, I might try that out in the week. I’ll let you know how it compares if I do. Cheers.

Hmm…also let me point out that my comment refers to outputting integer sample values. You would have to quantize the floats and emit them as 16-bit signed integers. The results won’t be very impressive if you use float.

If the blocks are big enough, you might be better off just using the FLAC format to compress it.

If the audio stream corresponds to a song, and you have calculated the beats per minute and you know the starting sample number of the first downbeat, in theory you could get extremely high (relative to other nonlossy) compression by first collating the samples by their position in each measure. In other words, output the first sample of every measure, then output the second sample of every measure, etc… and then take the first derivative of the whole thing and compress that (with bzip or zlib).

This wouldn’t work for streaming data (since you need the whole song in memory) but still, it could be interesting.

Have you tried bzip? It gets pretty good results compared to zip:

http://www.bzip.org/

Don’t reinvent the (broken) wheel.
If what you are looking for is reducing network load, use Vorbis / Speex / AAC / whatever codec to divide your bandwidth by at least 20 with no audible difference.
If you are looking for low latency AND best quality, you best option is http://www.celt-codec.org/
If you absolutely need NO LOSS in audio quality, you’re doomed to use Flac, but don’t expect more than 0.5x bandwidth save (I’m not sure it’s worth the trouble)

GZip/Bzip are not made to compress random data (or derivative of random data, or derivative of time correlated random data, or whatever). Entropy coding should happen on the very last step of an audio processing chain, that is, after the signal has been approximated by a model + error encoded, and what you’ll need to entropy encode is the model parameter and the error, not the original data anyway.

Point taken.

I’ve actually implemented the previous methods discussed here now but as you mention using a specialised codec is probably a much better idea. For posterity here are the compression ratios that resulted:

GZip on float samples: 0.92
Convert to int16: 0.5 (obviously)
Convert to int16 then GZip: 0.48
Convert to int16 differentiate then Gzip: 0.4

An initial test with Ogg at 128kb gives a ratio of 0.35. This will be much easier to control for the user as they can simply select a bit rate by a slider.

Which brings me onto another question (maybe I should start a new thread with this?). Is the built in OggVorbisAudioFormat suitable for compressing blocks of audio? At the moment I’m writing the samples to a memory block with a stream which seems to work ok but when I decompress them (using OggVorbisAudioFormat::createReaderFor()) the program just hangs in readSamples. It seems that the numAvailable to the OggReader is always 0.

I will look into this further if this method should be possible, just wanted to check that I’m using the OggVorbisAudioFormat in an intend manner before spending even more time debugging it. Any pointers welcome.

None of the audio format classes are designed for use on open-ended streams (because obviously things like wav, aiff couldn’t possibly do that). So you’d probably need to roll your own solution for this.

I just assumed from the original question that there was a particular reason for not using an audio-specific compressor.