[The Juce Archive Format Blog]


#1

EDIT: This is now “haydxn’s Juce Archive Format Blog” - tracking the development of the ZipFile based archive classes, which should provide a neat all-purpose method of storing any amount of any kind of data.

[16-12-07] Changed format to the altogether-more-flexible ‘ZipOutputStream’. Download ZipOutputStream.h

EXCITING UPDATE! w00t!

Compression now works!

While Juce has good support for reading zip files, it’s been noted several times that you can’t make them yourself with it. So, I knocked together a simple class to do just that.

Now we have a means of creating a file with multiple data types embedded [e.g. XML document data + images, all in one package]. I’d like to expand on it, making a wrapper for such ‘combination’ file formats.

Here it is: Download ZipFileMaker.zip - this archive created using ZipFileMaker.

It’s really straightforward to use:

  • Create a ZipFileMaker (on the stack will do)
  • Call addFile() for any files you wish to add. If you want a file to be nested in subfolders in the zip, simply use a path in the storedFilename parameter
    [e.g. “images/image1.jpg” ]. The compressionLevel parameter lets you specify how much to squash the file; 0 is uncompressed, else 1-9 from least to most compressed.
  • Call createOutputFile() when you’re done.

And that’s it! To prove it, here is the code that I used to create the zip file you just downloaded… (of course i only used literal paths for quick testing purposes!)

ZipFileMaker zipper;
zipper.addFile (File(T("C:/ZipFileMaker.h")),9);
zipper.addFile (File(T("C:/ZipFileMaker.cpp")),9);
zipper.addFile (File(T("C:/ZipOutputStream.h")),9);
zipper.addFile (File(T("C:/ZipOutputStream.cpp")),9);
zipper.createOutputFile (File(T("C:/ZipFileMaker.zip")),true);

[note] This will have problems if trying to write to a volume that doesn’t allow seeking in the output stream. I’ve not had any experience with that yet, but perhaps USB (or networked) drives may have such limitations. It’s possible to write the header data in a different way to get around this, but that’s really beyond what I could be bothered with! Feel free to add support for this if you’re willing (you’d need to read up on the zip spec - it involves placing additional data AFTER the entry content; there should at least be a clue in the .cpp file).

Alternatively of course (if it is a problem for you), you could just create the output zip file in a temp location on a normal drive, and copy it to the troublesome volume when done.


#2

What is missing to actually compress the data ?


#3

Well if you were to write an output stream (similar to the GZIP compressor output stream) that could compress in the correct manner, then it would be a simple case of using that in the places indicated by the comments in the above code. [HINT: it’s all in writeLocalEntryToStream ()]

I thought the GZIP one would work initially, but I couldn’t get the files to uncompress. I tried all the flag options [when making the entry headers you set flags to indicate the type of compression used] but the zip readers all got confused when attempting to read them back out, no matter which i chose. From reading more on the spec of the zip format, it seems that gzip isn’t really one of the compression types supported (there’s stuff about how it has some different requirements in the output file but tbh I didn’t really look too much into why or what that was/meant). For my needs, I just wanted to be able to package files up into a single archived format, so didn’t go any further.

I’m not sure how easy writing such a stream would be though, but if it follows the same interface it should just slot right in nicely.


#4

Does the deflate algorithm of zlib would work ?

I’m a noob with Zip file. I don’t know what is the algorithm used.
From the things I’ve read, it looks like zlib’s deflate/inflate algorithm should do, but I don’t get why the GZIP stream class don’t work here.

Maybe rewritting the GZIP stream class with the basic options could do?


#5

I’ve just been investigating further, and have found a tweak to GZIPCompressorOutputStream that makes compression work!

The only thing is, I’ve got no idea whether or not changing it could be harmful to existing Juce code, so I guess jules might want to check it out.

The tweak is… setting line 65 to read ‘nowrap = true’ instead of false.

Go figure… I’m just tweaking the zipfilemaker code to make use of compression- if the change can’t be applied to juce, it could at least use a modified copy of the class for its own compression.


#6

Ta-Daaaah!

Here you go: DOWNLOAD ZipFileMaker.zip

When adding a file, if the compression parameter is 0, it will just be stored uncompressed. A value of 1-9 will cause a ZipOutputStream to be used to compress the data. [ZipOutputStream is just a copy of GZIPCompressorOutputStream, with the necessary tweak; should Juce get a modification in favour of this, we can change line 162 of ZipFileMaker.cpp to instantiate one of those instead, and bin the two new files].

Incidentally, as proof of whether or not this works, ZipFileMaker.zip was created using ZipFileMaker.


#7

I’ve just made some good progress on a wrapper ‘Archive’ class tonight too, making it easier to use a zip file as a ‘document’.

The idea is to be able to have a file ‘open’, and add files to it, or retrieve files from it, for the purposes of the application. It makes use of a temp folder to unpack files as they’re needed, and will have helper functions to, e.g. retrieve images.

I guess the first obvious test app to make with it will be a winzip-style app, then I want to make something more demonstrative of its potential as a ‘user data + binary data’ combined format. This is what I need for the classroom quiz (teaching aid) application I’m building - for storing images in a question bank file.

I’ll keep you posted, and present the classes when they’re ready for consumption! Til then, have fun with the ZipFileMaker!


#8

Nice one haydxn !

Anyway a cool feature could be the ability to add a stream to the zip file (besides a file itself), so one can stream blocks from memory without the need to create any files on the disk (think a sampler which stores every samples that it have in memory).

you rock as usual !!

:smiley:


#9

Yes, that is definitely something that I’ve considered [and i consider it important]; when ‘saving’ an archive file, it basically needs to be reconstructed and replaced. Currently that means unpacking all the files to a temp folder, when all I really want to do is use the streams from the original zip.

It’s on today’s to-do list :slight_smile: Looks like ZipFileMaker is going to get more complex than I could be arsed with! :’( :wink:


#10

i think logically you can convert the ZipFileMaker class to be the juce Archive class, letting you load a zipped archive, add external files to it, extract some of them to disk, create an archive by union of 2 other archives, create streams for reading files and then save it (and back and forth again and again).

After we would ask Julian to add it in the trunk :slight_smile:
Much useful !!


#11

I certainly think that Archive is to be a class that can wrap up creation and reading duties into a friendly interface. The one I knocked together last night has already proved itself, but it needs work.

From all the research I’ve done on the subject, and my new found experience, I think it’s probably safer though to make sure Archive is kept logically as a wrapper to the other classes; reading a Zip file and creating a zip file are two very separate tasks. I’ve proven that they can be done together, but I believe that if I didn’t keep the ‘writing’ side of things in its own dedicated class, things could get very messy and confusing. I reckon it deserves less of a FisherPrice name though - perhaps ZipFileWriter would be more appropriate!

I’m currently expanding it to make it more flexible; separating the steps of the writing procedure to make it easier to write in different ways from different sources. Ultimately though, the ‘save to zip’ operation must be performed in one sequence, and it has to remember pretty much everything about the process as it’s going in order to successfully write the directory at the end of the file. That’s one of the main reasons why it should be its own boss - it’s got enough on its plate without being responsible for other duties!

But yes, definitely - the end result will be an Archive class that we can use to cover all aspects; there should be no need to use either ZipFile or ZipFileWriter directly.

It’s all going smoothly, the main design hurdle ahead is coming up with a neat method for managing the archive structure; I was going to go for a ‘tree’ structure of all the entries in it, but that might be a bit messy. I basically don’t like the idea of having to specify paths for the entries - i’d like to be able to navigate to a logical folder, and add an entry to that. Perhaps that will mean expanding my ‘ArchivedFile’ class (currently just a ticket containing a String with the full entry path) to behave a lot like the File class (e.g. getSiblingFile, etc…).

Please feel free to share any ideas. For now, I continue enhancing the ZipFileWriter. [may even address the issue of non-seekable output volumes, though i may save that for another day]


#12

I guess I’m going to treat this thread as ‘The Archive Classes Blog’. I’ll just keep posting the development of it, so if anyone’s interested they can read and offer suggestions.

I’ve made some more progress. ZipFileMaker previously would store up information on the files to be added, and then do a bulk dump of the zip file in one go when you hit the fat red button. I might leave ZipFileMaker as a wrapper for the new (potentially scarier) ZipFileWriter, for those who like it.

I guess a ZipFileWriter is kind of like a stream. It takes an output stream (or a File), and on construction is ready to have entries written to. When you write each entry to it, they’re written to the output immediately. This is made possible by also writing the necessary central directory information to a memoryblock at the same time. When you’re done (call finishWriting()), the memoryblock is written to the output, and then the final directory tail is written before closing the output.

It may not sound like a major change (and perhaps to some it may sound like an unwelcome one) but it allows for more flexibility - including the use of input streams as sources [so you can control their lifetimes as necessary].

A few other tweaks open the possibility of writing the entries to memoryblocks too, to be written to the output file when ready. This would get around the potential failings of ‘non seekable output volumes’, but I’m not yet sure if it’s a good idea to do this; does anyone have an opinion on this? my worry is that writing a very large source file would obviously use a lot of memory. perhaps a maximum limit could be put in to force direct file output instead. I’ll try it anyway - so an option when writing an entry could be ‘createInMemoryFirst’.

I’m going to add some more functions for writing various things to the zip; e.g. copyEntry (ZipFile& source, int entryIndex)

I’ve not yet tested writing streams as entries; obviously memory content written from a stream wouldn’t necessarily work as a ‘file’ in a zip (so exploring it in winzip could unpack garbage files) - but i guess it’d be recoverable nicely from the archive within your program. Ooh, it’s quite neat really, isn’t it? When it’s done we should have lots of new possibilities - a nice structured general purpose archive format where we can store data of any kind in an easily retrievable manner [plus, if we’re using it to store normal files, it’s in a format that can be read universally - like how XML is a very friendly storage, happy to oblige inquisitive eyes]


#13

You’ve made it.

This means there is finally a way to save documents in a single file without requiring external software like PicoStorage and such…

It just happens at the right time for my own projects, so thank you very much.


#14

Here you go - chuck away ‘ZipFileMaker’ - this is MUCH better…

ZipOutputStream.zip

[!!!]NOTE: needs the latest SVN juce - the GZIPCompressorOutputStream has been updated slightly to be compatible with it[!!!]

This archive has two code files in, and one new class to use - ZipOutputStream.

It’s fully commented, so I shouldn’t need to explain it much here, but I’ll give a quick overview…

  • It’s an OutputStream, so you can write to it easily.

Just start an entry (described with a ZipFile::ZipEntry object) and then write to the stream. You’ll hit an assertion if you’ve done your numbers wrong and try to close the entry without writing the correct amount of data, so it should be fairly reliable.

There are two helper functions to make it even easier - one lets you add a File to the zip, and the other lets you copy an entry directly from an existing ZipFile.

Call finish() when you’re done writing your entries. Or don’t - it’ll get called automatically in the destructor - but you should really, just like you should wash the dishes before the kitchen gets unpleasant.

  • calling setCompressionLevel(int) will set the compression level for all subsequent entries.
  • calling setZipFileComment() lets you add a comment to the zip file.

That’s it! I’m pretty sure that’s about as complete a dedicated Zip-writing class as we could hope for [except of course encryption support - but i decided against looking into that because i don’t think ZipFile actually supports encrypted zips anyway]! Hopefully you think it’s as great as I do :wink: .

Next up… the Archive class…


#15

yeah haydxn, that is the kind of classes we need :slight_smile: good job !


#16

updated slightly, same download link - added two nice helpers for starting entries - now you don’t need to make a ZipEntry at all; just:

startEntry (name, size)
or
startEntry (name, size, time)

There are still a few things that might want addressing; the attributes of entries are currently disregarded [i.e. the file attribs for when they are unpacked as files]. For its use as a means of archiving data though, that’s not important [and it doesn’t make a massive difference even for files, but some zip apps pop up a warning when the file attribs are different on extraction].

Anyway, bedtime! hopefully make some progress on an archive class tomorrow. If you have any ideas or suggestions for things that would be desirable/practical, then feel free to chip in!


#17

haydxn, have you looked to the PicoStorage lib ?

It’s nicely written (à-la Juce), support compression (through not sure what compressor is used). It’s made for archive with thousand of small files, like XML archives + resources, and doesn’t consume too much memory.

I’ve contacted the author, and he is ok to release it in the public domain.

Anyway, it’s already a good step in a true independent archive for Juce based software.


#18

I have now! It looks like it’s a nice lib, but I’m rather enjoying piecing together something new :slight_smile: plus it’s a little different to what I had in mind, and I like the idea of using Zip files specifically, because they’re so obvious and straightforward [if you overlook the fact that it’s taken this long for us to be able to write our own from juce! ;)] and they can be opened by anyone. Some people might not think of that as a good thing but I certainly do.
I think quite a few important features/workings of that picostorage lib probably aren’t really compatible with the way zip files need to be written, but it’s nice to see anyway.

So far the Archive class is working like this:

Items in an archive are of a base ‘Archivable’ type (based on InputStream). Subclasses override a couple of functions to prepare a stream for reading, and most other stuff is done automatically. Just give it an output name and a timestamp and you can add it to the archive.
For example, ArchivableFile is a wrapper for a File object, so you can just create one and add it.

Archive is a FileBasedDocument; when you load an archive, it creates a ZipFile of the loaded file (whose ZipEntries are each wrapped as an ArchivableZipEntry, and automatically added to the Archive). You can add files/data by simply adding Archivable items, and you can remove them too - when you save the Archive, all the current Archivables are written out to the target file.

Basically, anything can be made into an archivable; there are some basic types, e.g. ArchivableFile, ArchivableText; I’m going to do some other helper ones too i think - like an ArchivableImage, which can be given a Juce::Image and archive it as a JPEG or PNG file. Or you can just bung any old data into items in the archive, and it’s all retrievable by name.

Pretty nice! I’ll post the code once I’ve had some fresh time with it tomorrow :slight_smile:


#19

i personally would prefer to not force the developer to subclass from Archive, it’s better to keep the class as instantiable and then you can add objects or memory blocks by name (optionally organized in folders and subfolders).


#20

no, you don’t have to subclass archive :slight_smile: - you can subclass Archivable to define different types of item within the archive; of course you may not need to do even that, as there will be a bunch of different Archivable types supplied.

So, basically it’s like this…

class Archive
{
Array<Archivable*> items;
};