A little confused by File Input and Output streams


#1

There’s no way to create a stream that is both read and write.

So, if I want to open a file for random access reading and writing then I must create two objects, a FileInputStream and a FileOutputStream. But this means there are two separate file descriptors.

The output stream is buffered, so if I write a little bit to the file and later try to read it with the matching input stream, it will not see those bytes! I would have to flush after every write.

Jules how could you not have noticed this??!


#2

TBH in all my years of hacking, I’ve never needed a dual input/output stream. Seems to me that almost every file operation tends to be either reading or writing, and that when you find yourself needing to do both at once then you’re probably writing code whose behaviour relies on shaky assumptions about how the file-system will behave.

Certainly I’d have thought that if your algorithm is being broken by the writer using an internal buffer, then it could also be affected by the OS doing something similar (I’ve no idea if OSes actually do buffer data before flushing it to disk, but it wouldn’t be surprising if they did).


#3

[quote=“jules”]TBH in all my years of hacking, I’ve never needed a dual input/output stream. Seems to me that almost every file operation tends to be either reading or writing, and that when you find yourself needing to do both at once then you’re probably writing code whose behaviour relies on shaky assumptions about how the file-system will behave.

Certainly I’d have thought that if your algorithm is being broken by the writer using an internal buffer, then it could also be affected by the OS doing something similar (I’ve no idea if OSes actually do buffer data before flushing it to disk, but it wouldn’t be surprising if they did).[/quote]

Completely false. Consider what happens if you were to use FILE objects via fopen, fread, and fwrite. As long as you go through just the one FILE object, you will get a consistent view no matter how much reading and writing you do. The OS can definitely buffer data but when you read through the same file handle, it goes through the buffer and everything is consistent.

But with FileInputStream and FileOutputStream, there are two separate file handles. Each one has its own buffer. So writing to the one, will not retrieve that information when reading from the other.

The use-case for dual input/output should be obvious: RandomAccessFile. Currently in JUCE there is no way to reproduce the behavior of a buffered i/o file opened for reading and writing, e.g.:

FILE* f = fopen (path, "rwb+");

Obviously we would want this if we are implementing a database.

Jules you should move the platform specific native bits currently in FileInputStream and FileOutputStream into a new class RandomAccessFile, then change FileInputStream and FileOutputStream to use RandomAccessFile.

Then add createInputStream and createOutputStream to RandomAccessFile. This way you can have two streams that use the same underlying RandomAccessFile. You would probably want to remove the buffering from the stream. If you want buffering it should be done in RandomAccessFile so that the views are consistent.

You can tweak File::createInputStream and File::createOutputStream to just first create a RandomAccessFile and then call the corresponding function on it.

FileInputStream and FileOutputStream will probably need an OptionalScopedPointer to hold the underlying RandomAccessFile


#4

I agree that that’d be a good design, but don’t really agree with the argument behind it.

I doubt it! If you’re writing a database, you certainly wouldn’t want to give the responsibility for all your locking/flushing to the low-level file handle object - you’d need higher-level constructs to do that on a transactional basis. And I’ve found the same thing to be the case in every real-world situation I’ve worked on which involved interleaved read/writes. You always need higher-level locking, and that’s generally where you’d do a flush or have some other layer on top of the file itself.

But I do think a RandomAccessFile class would be a good thing! However I’m drowning in tracktion stuff right now. Keep nagging me about it!


#5

Sure, I can agree with that. But at the end of the day, the platform specific file I/O should not be split into FileInputStream and FileOutputStream, it should be part of RandomAccessFile. Whatever else we build on top of that, like buffering, is just extras.


#6

I’m going to need it a lot sooner than you probably will be able to provide it.

Can I fork JUCE and work something up? I can get almost all of it done and then pass it off to you as a branch in my fork for final polish and testing on the platforms I don’t work under (Android, iOS).


#7

Sure, get forking!


#8

Brother, its time to delete the ‘modules’ branch (lol)


#9

Yes, I should…!


#10

hi,

 

I just tracked a bug in our code back to how InputStream and OutputStream interact in juce which is similar to this topic.

In earlier juce versions (sorry I didn't keep track of which version) I was able to have 2 simultaneous InputStream and OutputStream open for the same file and after flushing the OutputStream explicitely, the InputStream was actually able to read the updated content exactly.

Now I need to close and recreate the InputStream in order to get an updated view of the file. It seems like InputStream now has an internal reading cache, or like it doesn't update the fileSize. Sorry I didn't had time to verify those hypothesis.

My use case is quite simple: write content to a file, then read it back to compute and MD5 and update the file header with the MD5.

Keeping InputStream and OutputStream open simultaneously in my case is an optimisation more than a necessity.

So now that I have a workaround, I'd like to know whether the former behaviour was offering more than advertised and I get caught thinking that was supposed to work as-is in subsequent juce updates? or if the new one is somehow broken?

 


#11

It's entirely up to each stream class how it chooses to buffer and flush itself.. Certainly some of the file classes will use internal buffers, the only thing that's certain is that you shouldn't make any assumptions about it.


#12

Random file access is not as exotic as you might think.

One use case for RandomAccessFile is a stream that is appended to by one thread and read from by another (within the same process).  Incoming live data is written to disk (audio, video, stock prices) and one or more other threads consume it (playback, display). Basically all video and audio surveillance apps would work this way.

Not unlike with a PVR, you will want to rewind, fast forward, etc. while recording continues.

For data that is cached over long periods of time (hours, days, weeks), using disk storage is inevitable.

Class hierarchy wise, RandomAccessStream would be a sublcass of FileOutputStream, which in turn is a subclass of FileInputStream. This way all behaviour could be implemented consistently and without redundance.

Has there any progress been reported yet? I would be highly interested.

 


#13

Nobody else ever asked about this, it's not a priority for us!

BTW inheriting FileOutputStream from FileInputStream would be a really bad design IMHO. A new class could inherit both InputStream and OutputStream, but you wouldn't attempt to build it out of the existing classes.


#14

Ok, I see. Just wondered how a DAW could do without random access files. 

As for the class hierarchy, that's rather a philosophical question ;-) I haven't seen any write-only-storage yet, so if you can write to a file, you can also read from it.

I've been using a class named ReadAppendStream a lot and wonder how to best port this to Juce.