Parsing a text file... how hard could it be?

Not JUCE related, but DAW related - just a simple text parsing task…

Pro Tools has this wonderful ‘Export Session Info as Text’ function that gives a handy readout of your timeline and markers. Very useful if you want to diff two sessions, actually! I have a little script that parses these text files and I was trying to refine it to handle some trickier cases involving tabs and line breaks. Here’s a sample of the text file format:

T R A C K  L I S T I N G
TRACK NAME:	Audio track
COMMENTS:	Track comment.
USER DELAY:	0 Samples
STATE: 
PLUG-INS: 
CHANNEL 	EVENT   	CLIP NAME                     	START TIME    	END TIME      	DURATION      	STATE
1       	1       	Region group 1 name.          	        401408	        770048	        368640	Unmuted
1       	2       	Region group 2 name.          	        770048	       1138688	        368640	Unmuted
1       	3       	Region group 3 name.          	       1138688	       1507328	        368640	Unmuted

It’s kind of a tab-and-space separated format but with no quoting or escaping tabs and spaces within the ‘columns’. So, you can knock the columns out of alignment quite quickly:

M A R K E R S  L I S T I N G
#   	LOCATION     	TIME REFERENCE    	UNITS    	NAME                             	COMMENTS
1   	00:00:09:02  	401408            	Samples  	Marker 1 name. Line breaks >>>
<<< allowed & tabs >>>	<<< allowed	Marker 1 comment. Line breaks >>>
    	             	                  	           	                                 	<<< allowed & tabs >>>	<<< allowed. Line breaks >>>
    	             	                  	           	                                 	<<< allowed & tabs >>>	<<< allowed. Line breaks >>>
    	             	                  	           	                                 	<<< allowed & tabs >>>	<<< allowed.

So, you can’t simply split the lines using line breaks or tab characters.

I have a couple of ideas for how to parse it, maybe reading backwards or figuring out the width of the ‘columns’. But I thought I’d post it up here and see if anyone has any other ideas as there might be a few Pro Tools users around.

So, the idea would be to parse each ‘line’ into a either a region group or marker struct like these:

struct ProToolsRegionGroup
{
    int channel;
    int event:
    String clipName;
    String start;
    String end;
    String duration;
    String state;
}

struct ProToolsMarker
{
    int id;
    String location;
    int timeRef;
    String units;
    String name;
    String comments;
}

Thanks in advance for any thoughts. Example text file attached:
tabs and line breaks.txt (3.0 KB)

use regex

^(\d+)\s+(\d+)\s+([\w\s]+.)\s+(\d+)\s+(\d+)\s+(\d+)\s+([\w\s]+)$

given that the clip name always finishes by a dot

Is that dot guaranteed?

If not, you know the first two columns are numbers, the last columns are numbers too. So you know where these columns start and end. The bit in between would be the clip name.

I put the dot in, whoops!

The starting and ending numbers are a good route. To make things more difficult, those numbers could potentially be timecode/bars, beats as well.

Since I can’t rely on line breaks to separate the lines, I was thinking of separating by a regex matching ‘numbers-space-tab-numbers’. This could be matched by a ‘CLIP NAME’, but would be unlikely.

Then I just have to hope no-one types a tab into the marker name…

Meh. Export to OMF, they’re far easier to parse.

Ha ha true!