Crash when creating zip files with large input files

Hi Jules!

When I am using ZipFile::Builder to create a zipfile that contains large input files (e.g. wav files of more than 500 MB), I get an access violation error in MemoryOutputStream::write.

I am using the latest JUCE (3.0.5, updated today) on Windows 7 64bit.

I found that MemoryOutputStream::prepareToWrite(...) returns an invalid writePointer, when ensureSize() fails so that blockToUse->getData() becomes null. In this case a nullptr should be returned.

With large input files, the size of the MemoryOutputStream has to be enlarged quite often, because
blockToUse->ensureSize ((storageNeeded + jmin (storageNeeded / 2, (size_t) (1024 * 1024)) + 32) & ~31u);
will only increase size by about 1MB each time.
After a lot of realloc, it will fail when the stream becomes larger than about 500MB. I guess that this has something to do with memory fragmentation.

 

I found two possible solutions, which allowed creating the zipfiles without problems.
Both reduce the number of reallocs by using larger buffers.

1)
In
MemoryOutputStream::prepareToWrite(...)
I replaced
blockToUse->ensureSize ((storageNeeded + jmin (storageNeeded / 2, (size_t) (1024 * 1024)) + 32) & ~31u);
by
blockToUse->ensureSize ((storageNeeded + (storageNeeded / 2) + 32) & ~31u);

This worked in my test case, but it may use too much unnecessary RAM in other cases.

 

2)
In
ZipFile::Builder::Item::writeData(...)
I replaced
MemoryOutputStream compressedData;
by
MemoryOutputStream compressedData(file.getSize());

The stream should not become larger than the original file when compressing, so this should avoid any realloc. It may fail if there is not enough RAM, but in that case creating the zipfile would fail anyway.

 

What do you think?

Best Regards,
Gregor

 

Presumably this is in a 32-bit app, because if it was 64-bit, I wouldn't expect malloc to ever fail.

TBH I had been meaning for some time to re-engineer MemoryOutputStream so that rather than reallocating a single block, it would keep a list of blocks and only merge them together when you flush it or request the resulting memory block - that'd be more complicated internally, but should be much faster for large streams. I might actually make that change now, since you've raised the subject..