GZIPCompressorOutputStream changes


#1

Did something change in that class ? It sopped working for me, i have a very simple function that saves a compressed XML document like so:

MemoryOutputStream panelBinData;
ScopedPointer <XmlElement> panelXml (panel->getPanelTree().createXml());
GZIPCompressorOutputStream gzipOutputStream (&panelBinData);
panelXml->writeToStream (gzipOutputStream,"");
gzipOutputStream.flush();

if (fileToSave.hasWriteAccess())
{
	fileToSave.replaceWithData(panelBinData.getData(), panelBinData.getDataSize());
}

that worked for me but now the files can’t be opened after the save, i also checked using linux zcat command and it says the data is not in gzipped format. Is there something wrong with what i’m doing ?


#2

No, I don’t think I’ve changed anything, and I’ve been using the class recently for creating zip files, so am pretty sure it works… Have you tried reading back the result with a GZIPDecompressorInputStream?


#3

A simple test scenario, copy/paste should work:

void testGzip()
{
	File compressedFile("c:\\File.gz");
	ValueTree test("myTree");
	Array <File> tempFiles;
	MemoryOutputStream testBinData;
	GZIPCompressorOutputStream gzipOutputStream(&testBinData);

	File tempDir = File::getSpecialLocation(File::tempDirectory);
	tempDir.findChildFiles (tempFiles, File::findFilesAndDirectories, true, "*.*");

	for (int i=0; i<tempFiles.size(); i++)
	{
		ValueTree child("file");
		child.setProperty ("path", tempFiles[i].getFullPathName(), 0);
		child.setProperty ("hash", tempFiles[i].hashCode64(), 0);
		test.addChild (child, -1, 0);
	}

	ScopedPointer <XmlElement> testXml (test.createXml());

	if (testXml)
	{
		Logger::writeToLog ("testXml is valid, tree size: " + String(testXml->createDocument("").length()));
		testXml->writeToStream(gzipOutputStream,"");
	}

	gzipOutputStream.flush();

	Logger::writeToLog ("data in memory output stream="+String(testBinData.getDataSize()));

	compressedFile.replaceWithData (testBinData.getData(), testBinData.getDataSize());

	Logger::writeToLog ("file size after write="+String(compressedFile.getSize()));

	/* read the data */

	ScopedPointer <FileInputStream> fz(compressedFile.createInputStream());
	if (fz)
	{
		GZIPDecompressorInputStream gzInput(fz, false);
		MemoryBlock decompressedData(8192, true);
		MemoryBlock buf(8192);

		while (!gzInput.isExhausted())
		{
			const int data = gzInput.read (buf.getData(), 8192);
			Logger::writeToLog ("read "+String(data) + " bytes from compressed stream");
			decompressedData.ensureSize (decompressedData.getSize()+data, false);
			decompressedData.append(buf.getData(), data);
		}

		String xmlData = decompressedData.toString();
		Logger::writeToLog ("decompressed data size="+String(decompressedData.getSize()));
		Logger::writeToLog ("decompressed document length="+String(xmlData.length()));
	}
}

the output is:

testXml is valid, tree size: 51154
data in memory output stream=7776
file size after write=7776
read 8192 bytes from compressed stream
read 8192 bytes from compressed stream
read 8192 bytes from compressed stream
read 8192 bytes from compressed stream
read 8192 bytes from compressed stream
read 6819 bytes from compressed stream
decompressed data size=103750
decompressed document length=0

#4

Well, I’ve no idea what you’re doing with all that memoryblock stuff, but when I changed it to this:

[code] ScopedPointer fz(compressedFile.createInputStream());
if (fz)
{
GZIPDecompressorInputStream gzInput (fz, false);

   DBG (gzInput.readEntireStreamAsString());

}
[/code]

…it printed out some pretty sensible-looking xml.


#5

you are correct it looks sensible, but not sensible enough, in the test i pasted have a look at the size difference:

testXml is valid, xml size: 655
<?xml version="1.0" encoding="UTF-8"?>

<myTree>
  <file path="C:\Users\atom\AppData\Local\Temp\72ae9928f86fcffb12abc4560b045360.modlist.txt"
        hash="-5445633642097156142"/>
  <file path="C:\Users\atom\AppData\Local\Temp\dd_clwireg.txt" hash="-3161997345197306734"/>
  <file path="C:\Users\atom\AppData\Local\Temp\dd_NDP40-KB2468871-v2-x64_decompression_log.txt"
        hash="-5371747960322628592"/>
  <file path="C:\Users\atom\AppData\Local\Temp\FXSAPIDebugLogFile.txt"
        hash="-3164131687747995014"/>
  <file path="C:\Users\atom\AppData\Local\Temp\Setup Log 2011-09-06 #001.txt"
        hash="-2067086739884436322"/>
</myTree>

data in memory output stream=313
file size after write=313
decompressed xml length=619
<?xml version="1.0" encoding="UTF-8"?>

<myTree>
  <file path="C:\Users\atom\AppData\Local\Temp\72ae9928f86fcffb12abc4560b045360.modlist.txt" hash="-5445633642097156142"/>
  <file path="C:\Users\atom\AppData\Local\Temp\dd_clwireg.txt" hash="-3161997345197306734"/>
  <file path="C:\Users\atom\AppData\Local\Temp\dd_NDP40-KB2468871-v2-x64_decompression_log.txt" hash="-5371747960322628592"/>
  <file path="C:\Users\atom\AppData\Local\Temp\FXSAPIDebugLogFile.txt" hash="-3164131687747995014"/>
  <file path="C:\Users\atom\AppData\Local\Temp\Setup Log 2011-09-06 #001.txt" hash="-2067086739884436322"/>
</myTree>

it’s 655 before compression and 619 after, and even looking at the debug output it’s different (i think tabs are removed), but remember no XML parsing is done after decompression so the strings should be identical (i think).

Also i noticed why my actual code is not working, after decompression my XML documents look ok, however they are missing the “<” in the “<?xml” (the first character is gone after decompression), and the XML parser fails.


#6

Ok, look, I’ve been meaning to write a unit test for this anyway, so I did one:

[code]class GZIPTests : public UnitTest
{
public:
GZIPTests() : UnitTest (“GZIP”) {}

void runTest()
{
    beginTest ("GZIP");
    Random rng;

    for (int i = 100; --i >= 0;)
    {
        MemoryOutputStream original, compressed, uncompressed;

        {
            GZIPCompressorOutputStream zipper (&compressed, rng.nextInt (10), false);
            
            for (int j = rng.nextInt (100); --j >= 0;)
            {
                MemoryBlock data (rng.nextInt (2000) + 1);
                
                for (int k = data.getSize(); --k >= 0;)
                    data[k] = (char) rng.nextInt (255);

                original.write (data.getData(), data.getSize());
                zipper  .write (data.getData(), data.getSize());
            }
        }

        {
            MemoryInputStream compressedInput (compressed.getData(), compressed.getDataSize(), false);
            GZIPDecompressorInputStream unzipper (compressedInput);

            uncompressed.writeFromInputStream (unzipper, -1);
        }
        
        expectEquals ((int) uncompressed.getDataSize(),
                      (int) original.getDataSize());
        
        if (original.getDataSize() == uncompressed.getDataSize())
            expect (memcmp (uncompressed.getData(),
                            original.getData(),
                            original.getDataSize()) == 0);
    }
}

};
[/code]

…and it passes.

I’m not sure what you’re doing wrong, your code’s a bit spaghetti-ish and I can’t really see what’s going on, but there must be something silly happening in there. Try cleaning it up and restructuring it, and my guess is that you’ll find a mistake.


#7

Just to explain myself (everytime i hit a problem like that i feel like a complete asshole for crying about it, sorry for that)

Anyway i used the memoryBlock stuff to display some progress for large files.

I fixed my problem, it was due to scope issues, i never though about putting all those streams in special scopes “{}”, and i was getting incomplete results.
Thank you for showing that to me.


#8

Well, it made me write a unit-test, and you got a lesson in better scoping, so wasn’t a disaster!

Good scoping is really important: always try to structure your code so that the absolute minimum number of objects are in scope at any given time.


#9

Actually, while doing some other work today I spotted a bug in the gzip class that might actually be the real cause of what you’re seeing there.

And it’s a really nasty one to fix, because there’s no perfect solution… Basically, in the current code, when you call flush() on a gzip stream, it finishes the zip operation, which performs some special magic and closes the stream - that means that any subsequent attempts to write to the stream will fail. The bug was that if this happened, the code wasn’t asserting, so it would go unnoticed.

My first reaction was that I should just make flush() do nothing, and close the stream in the destructor instead. But then, the finished result is invalid until the stream object is actually deleted - and I bet that a lot of people will have code that calls flush() and then tries to use the result before they’ve bothered to delete the stream object.

So, I guess the best solution here is to leave flush() as it currently works, but make sure there’s an assertion if you try to write to it afterwards (despite this being a reasonable thing to want to do). Annoying that there isn’t a fix for this that works in all cases!


#10

but then it should be better named finalize() or something similar, and mark flush() as deprecated …


#11

flush() is a pure virtual method in the OutputStream base class, so can’t be avoided.