Advice needed for de-serialisation/serialisation of complex type


#1

Hi Guys,

I’m looking for a little advice in regards to serialising and de-serialising an object which contains a complex type.

So lets say I have a class that looks something like the below (heavily simplified).

template<typename T>
class AudioDataSet
{

public:
	AudioDataSet();
	~AudioDataSet();

private:    	
	int bufferSize;
	int numStftFrames;
	int delayedBuffersCount;

	arma::Mat<T> data;
	arma::Row<unsigned int> labels;

};

I want to be able to save and load an instance of my AudioDataSet class. I could use xml or overload stream operators.

My issue is serialising and de-serialising the data and labels members.

These members are matrix and row/vector types from the libarmadillo library. The data member is a matrix type consisting of columns and rows. Armadillo provides methods to load a matrix from a .csv but I need to be able to determine the buffer size and number of STFT frames that were used to create the matrix when I load it.

The matrix data consists of columns representing sounds/instances and rows representing various audio features for an instance.

My initial idea was to create an XML structure with nodes for bufferSize, numSTFTFrames and delayedBufferCount and then to create a node for the matrix/data member whose value would be a delimited / comma separated string that I can manually desiralize into an arma::Mat type upon load. This seems messy though.

My other idea was to use two seperate files. The first file would be the XML doc with the AudioDataSet information such as buffer size and num stft frames. However, the data node representing the matrix would instead be a file path to a seperate .csv which simply contained the raw matrix data and could be loaded in a second operation using the libarmadillo routines. Again…messy.

Does anyone have any words of advise on this one ? I don’t really want to get into using boost or anything similar and ideally need to find a clean way to load/save an AudioDataSet.

Cheers Josh


#2

In my opinion, there is a certain point from where it becomes vastly more complex and ‘not simple’ to use something like XML for serialization… Splitting up the files creates association issues between serialized instances. Can you not store the data in some array structure in the XML? The comma solution seems OK to me if it can be natively loaded by your lib, as you can always change it without breaking anything and it appears to be the path of least resistance.

Also, making a templated base class (de)serializable also seems weird, what happens if the T types in the serialized data doesn’t match your own type?

But wait… Are you basically trying to store/load audio data (ie. huge amounts of data)?


#3

Hey @Mayae

So the data isn’t raw audio data but will be floats/doubles. An AudioDataSet is just for single/double point precision and represents a training set for various machine learning tasks.

So a lets say the data matrix member has 5 columns for representing individual sounds/instances and 4 rows each representing a temporal/spectral attribute or feature value obtained during some feature extraction stage.

Say:

Armadillo provides a method to load an instance of its matrix type from a .csv. See: Save/Load.

However, this is only half the battle as I need the complimentary data such as the size of the analysis window that was used to create the feature vectors initially.

I agree that possibly going with XML and storing the raw matrix data into some array structure could work. It just seems like its going to get pretty messy manually de-serialising those comma separated fields from the XML node. I need to have an experiment. Ideally I guess I need to create some stream from the saved data/array structure that can be used by Armadillo. The library can read from various file types.

The data is relatively small. Talking max of 10 - 20k table cells (floats or doubles). So < 1mb ?

Ideally I’m trying to come up with something that doesn’t take me a week to work over. might be ambitious.


#4

create a create/setValueTree() routine, and load/save a binary stream from it.

Pros:
format is extendible, and stays compatible
its very small

The problem with textual based storing of float-values, its not binary compatible.


#5

Hey @chkn

Sorry, could you elaborate a little ?

How exactly would you go about using a value tree to replicate this class structure:

template<typename T>
class AudioDataSet
{

public:
	AudioDataSet();
	~AudioDataSet();

private:    	
	int bufferSize;
	int numStftFrames;
	int delayedBuffersCount;

	arma::Mat<T> data;
	arma::Row<unsigned int> labels;

};

Where data is a matrix type similar to the table visualisation in my previous post ?

I’m a little lost as to how I would create the value for the data member. Are you talking about turning the matrix/table itself into a ValueTree object for storage ? I’m a little lost as to how I do this without identifiers for the cells.

Are you saying create a “data” node in the tree which is just some raw binary data representing the matrix contents and then also have the other info (buffer size etc) as properties of the tree ?


#6

i usually store a zip file with a value tree as xml file in the zip root + binaries for data (with no compression in case data is already compressed) stored in subfolders with a reference of them in the xml. pretty convenient, fast and space saving. easy to inspect the result and even modify the data externally


#7

Hey @kraken

So an XML doc with a node/element holding references/paths to separate binary files ?

This seems like the easiest way to manage things to be honest.


#8

Well there are thousands of ways doing it

just something like: not tested pseudo code

    ValueTree createValueTree()
{
    ValueTree vt;
    
    vt.setProperty(bufferSizeID, bufferSize,nullptr);
    vt.setProperty(numStftFramesID, numStftFrames,nullptr);
    vt.setProperty(delayedBuffersCountID,delayedBuffersCount,nullptr);
    vt.setProperty(dataID, data.serlizeDataToStringOrMemoryBlockOrWhatever());
    vt.setProperty(labelsID, labelsID.serlizeDataToStringOrMemoryBlockOrWhatever());

   return vt;
};

void setValueTree(ValueTree vt)
{
    bufferSize=vt.getProperty(bufferSizeID, 0 /*default value*/);
    numStftFrames=vt.getProperty(numStftFramesID, 0 /*default value*/);
    delayedBuffersCount=vt.setProperty(delayedBuffersCountID,/*default value*/);
   
    data=data::deserialize ( vt.getProperty(dataID,...) );
    labels=labels:: deserialize (vt.getProperty(labelsID,...));

};

MemoryBlock toMemoryBlock()
{
    MemoryOutputStream mos;
    getValueTree.writeToStream(mos);
    return mos.getMemoryBlock();
}

ValueTree fromMemoryBlock(MemoryBlock &mb)
{
    MemoryInputStream mis(mb, false);
    setValueTree (ValueTree::readFromStream(mis));
}

You can load/save the MemoryBlock to a file then


#9

Hi @chkn

Yeah this is what I have at the moment and it works. Thanks for the tips.

Maybe I’ll just stick with this for now.

Cheers guys.


#10

It’s not the juce way, but supposed to be a very efficient serialisation/deserialisation:

or on github:

would love to see serialisation for this or for BSON directly from ValueTree or DynamicObject…


#11

you can handle everything with juce ZipFile (for entry storage), ValueTree (to/from xml) and Memory(Input/Output)Stream (and eventually AudioFormatManager for loading and saving lossless compressed sample data if it’s big… using flac for example)