Get Sysex from Midifile

Hi all,

thought the below code would extract Sysex messages from a Midi file but it doesn’t :frowning:

Any idea?


OwnedArray <MidiMessageSequence> trackSeqs;
OwnedArray <MidiMessage> messages;
			
if (!file.isDirectory())
{
	FileInputStream fileInput(file);
			
	MidiFile midiFile;
	midiFile.readFrom(fileInput);

	for (int i = 0; i < midiFile.getNumTracks(); i++)
	{
		trackSeqs.add(midiFile.getTrack(i));
		for (int j = 0; j < trackSeqs[i]->getNumEvents(); j++)
		{
			MidiMessageSequence::MidiEventHolder* eventHolder = trackSeqs[i]->getEventPointer(j);
			messages.add(&eventHolder->message);
		}
	}
}

Good grief, does that even compile?? An OwnedArray will delete any objects that you add to it!

Of course it deletes nested objects but only if it leaves the scope which is
not the case here. I am using OwnedArray a lot in my code to keep objects w/o problems.

Reading standard Midi messages like “Note On” this way works fine
but Sysex always returns size = 1 which is odd.

I don’t care about timestamps etc, just want to have all Sysex messages
in an array.

BTW: It does compile flawessly :slight_smile:

Thanks,
Joerg

Honestly, your example code is massively, horrifically broken. If you write code like that for real, you’ll create some terrible memory corruption bugs.

As for the MidiFile stuff, I can’t remember the details, I wrote it years ago - have you tried stepping through to see what’s going on in there?

Hi Jules,

Apart from the Mistake with OwnedArray´s I’ve made (Thank you very much for the hint!),
I think I could isolate the “corupt” part of code in:

MidiMessage:MidiMessage (const void *data, int maxBytesToUse, int &numBytesUsed, uint8 lastStatusByte, double timeStamp=0)

In a Sysex message, taken from a Midifile, the status byte will be followed by variable length bytes to tell the file reader the lenght of the data block!

That seems to not be supported…

Joerg

I wasn’t aware of that… Odd, because that’s MidiFile reader code has been used for years by lots of people - maybe some midi files don’t have length bytes, but just use the normal format with the terminating byte instead?

I’m a bit busy in other code areas at the moment, but if you wanted to suggest some changes that will get it parsing your files correctly, I’d be happy to sanity-check it…

Jules, I just took a quick peak at the Midi File spec, and it certainly does say that Normal Sysex events start with 0xF0 and then have a ‘variable length value that specifies length of the sysex data’. As well, there are two other types of sysex events specified ‘Divided’ and ‘Authorization’.

I was using this as my reference: http://www.sonicspot.com/guide/midifiles.html

Ok, I think this is just a little bit of incorrect logic in the message parser - try this modification to MidiMessage::MidiMessage(), around line 182:

[code] if (byte >= 0x80)
{
if (byte == 0xf0)
{
const uint8* d = src;
bool haveReadAllLengthBytes = false;

        while (d < src + sz)
        {
            if (*d >= 0x80) // stop if we hit a status byte, and don't include it in this message
            {
                if (*d == 0xf7)
                {
                    ++d;  // include the trailing 0xf7 when we hit it
                    break;
                }

                if (haveReadAllLengthBytes) // if we see a 0x80 bit set after the initial data length bytes
                    break;

                ++d;
                continue;
            }

            haveReadAllLengthBytes = true;
            ++d;
        }[/code]

I looks like it was just mistakenly thinking those length bytes were a break in the message. It’d have worked fine for messages shorter than 128 bytes, which may be why I never heard about this before.

Jules, your above correction of the code doesn’t work cause it does not remove those
lenght bytes off the message.

The below change I did to your code does also handle divide sysex messages.
I’ve used a memory block instead as it makes it easy to put all devided message
blocks together. You may will find some things to optimize :slight_smile:

MidiMessage::MidiMessage (const void* src_, int sz, int& numBytesUsed, const uint8 lastStatusByte, double t)
	: timeStamp (t),
	  data (static_cast<uint8*> (preallocatedData.asBytes))
{
	const uint8* src = static_cast <const uint8*> (src_);
	unsigned int byte = (unsigned int) *src;
	int varLenghtBytesUsed = 0;
	int numLenghtBytesToAdd = 0;

	if (byte < 0x80)
	{
		byte = (unsigned int) (uint8) lastStatusByte;
		numBytesUsed = -1;
	}
	else
	{
		numBytesUsed = 0;
		--sz;
		++src;
	}

	if (byte >= 0x80)
    {
        if (byte == 0xf0)
        {
                        const uint8* d = src;
			size_t blockLenght = readVariableLengthVal (d, varLenghtBytesUsed);
			d += varLenghtBytesUsed + blockLenght;
			numLenghtBytesToAdd = varLenghtBytesUsed;

			MemoryBlock mem(1);
			mem[0] = (uint8) byte;

			mem.append(src + varLenghtBytesUsed, blockLenght);

			//multi-packet message
			while (*(d - 1) != 0xf7)
			{
				//read send delay; we only need the number of used bytes
				readVariableLengthVal(d, varLenghtBytesUsed);
				d += varLenghtBytesUsed + 1; //(varLenghtBytesUsed + 1) includes F7 which 
											 //is the status byte in a divided Sysex message
				
				numLenghtBytesToAdd += varLenghtBytesUsed + 1;

				blockLenght = readVariableLengthVal (d, varLenghtBytesUsed);
				mem.append(d + varLenghtBytesUsed, blockLenght);
				d += blockLenght + varLenghtBytesUsed;
				numLenghtBytesToAdd += varLenghtBytesUsed;
			}

			data = new uint8[mem.getSize()];
			mem.copyTo(data, 0, mem.getSize());
			size = mem.getSize();
		}
		else if (byte == 0xff)
		{
			int n;
			const int bytesLeft = readVariableLengthVal (src + 1, n);
			size = jmin (sz + 1, n + 2 + bytesLeft);

			data = new uint8 [size];
			*data = (uint8) byte;
			memcpy (data + 1, src, size - 1);
		}
		else
		{
			preallocatedData.asInt32 = 0;
			size = getMessageLengthFromFirstByte ((uint8) byte);
			data[0] = (uint8) byte;

			if (size > 1)
			{
				data[1] = src[0];

				if (size > 2)
					data[2] = src[1];
			}
		}

		numBytesUsed += size + numLenghtBytesToAdd;
	}
	else
	{
		preallocatedData.asInt32 = 0;
		size = 0;
	}
}

Good grief, that’ll never work! What if the length bytes are missing or wrong?

I’m not actually convinced that it should remove the length bytes anyway - why can’t they just be dealt with by the receiver of the message? They’re between the start and end marker bytes, so surely they’re technically part of the message?

If the lenght bytes are missing or corrupt, then the Midi file is also corrupt.
Those lenght bytes are actually not a part of the message cause the’ve got injected by the
Midi file writer. Since this methode is meant to read Midi events from a file or stream, I
actually can’t imagine why they could be missing.

Anyway, juce is yours and you have to decide the way to go. I can also deal with lenght bytes
in my receiver engine…

Ok, try this instead:

[code] {
const uint8* d = src;
bool haveReadAllLengthBytes = false;
int numLengthBytesToRemove = 0;

        while (d < src + sz)
        {
            if (*d >= 0x80)
            {
                if (*d == 0xf7)
                {
                    ++d;  // include the trailing 0xf7 when we hit it
                    break;
                }

                if (haveReadAllLengthBytes) // if we see a 0x80 bit set after the initial data length
                    break;                  // bytes, assume it's the end of the sysex

                ++numLengthBytesToRemove;
                ++d;
                continue;
            }

            if (! haveReadAllLengthBytes)
            {
                ++numLengthBytesToRemove;
                haveReadAllLengthBytes = true;
            }

            ++d;
        }

        size = 1 + (int) (d - src) - numLengthBytesToRemove;

        data = new uint8 [size];
        *data = (uint8) byte;
        memcpy (data + 1, src + numLengthBytesToRemove, size - 1);
    }[/code]

works if you do:

size = 1 + (int) (d - src);

instead of:

size = 1 + (int) (d - src) - numLengthBytesToRemove;

But it does not support divided Sysex messages! Don’t know if you want to support that ???
It was the reason why I’ve used Memoryblock to collect all the divided parts…