Interesting potential race condition in ZipFile::Builder


#1

Hello.

I have a fascinating issue where I zip up a whole directory but intermittently a single file does not make it into the zip archive - even though it seems identical in perms to all the rest, and even though I can “see” the call to ZipFile::Builder::addFile going off!

I realized that in fact that file is being written just before I create the archive, and in the very same thread - so there can’t be more than a few microseconds between its creation and when it’s added to the archive. The file is successfully seen by my DirectoryIterator, ZipFile::Builder::addFile is successfully called on it, and yet that file, only, is not included in the .zip archive - it’s as if whatever builds the zip file hasn’t “caught up” with the actual state of the disk yet…!

If I, for example, write the file, sleep the thread for 100ms, and then create the archive, it works perfectly well every time.

I’m actually perfectly satisfied with that solution, because it’s an operation that’s only called once, and that rarely. However, this seems like a potential issue, and I thought I should report it.

My system stats:

juce_version: "JUCE v2.0.18" operating_system_type: 4096 operating_system_name: "Mac OSX 10.6" is_operating_system_64_bit: true osx_minor_version_number: 6 logon_name: "tom" full_user_name: "Tom Ritchford" computer_name: "hofmann" num_cpus: 8 cpu_speed_in_megahertz: 2800 cpu_vendor: "GenuineIntel" has_mmx: true has_sse: true has_sse2: true has_3dnow: false memory_size_in_megabytes: 14336

which amusingly enough happen to be the contents of the file that refuses to save!


#2

Not sure if this is your problem, but starting with OS-X 10.6 the file system changed quite a bit. See the note on file system efficiency here:

http://developer.apple.com/library/mac/#releasenotes/MacOSX/WhatsNewInOSX/Articles/MacOSX10_6.html#//apple_ref/doc/uid/TP40008898-SW7

Posix calls are no longer the lowest level, but are translated into the underlying file system. I haven’t tested it myself, but it is possible that close is no longer truly synchronous. Certainly a lot of people started reporting trouble trying to use aio reads and writes with 10.6. Another possibility is that you are using one of the functions that Jules resolves to a file manger (FSxxx) call. Those are generally async.

Instead of just waiting 100 mS, which is kind of arbitrary, you could try something like opening the last file written and interogating its size, then sleep and retry until you get the expected result. Jules does something similar in his save data overwrite for the File class.


#3

Read your reference - I’ll bet you that that’s the issue.

100ms? Arbitrary? :smiley: I’ll probably later put in your size-checking idea - but this is for the “Request Support” option so it’s rarely called and it wouldn’t be a catastrophe if that specific file didn’t show up (I could just ask them about their system if it came down to it…)

Excellent, very valuable link!