Sysex message allocated on read

Since the modernization of Midi in JUCE, we iterate over MIDI events like this:

juce::MidiBuffer buffer;

for (auto m: buffer)
{
    doSomething(m.getMessage());
}

With regular midi messages, like notes or MIDI CC, copying the message is fine, as it’s a small stack object.

However, with larger Sysex messages, that copy will allocate and call malloc internally inside MidiMessage.

Because getMessage() only allows to read the message by copy, I’m not sure there’s a way to get around that?

I’d love to know if you have any solution.

I guess its more a question of what the problem is? SYSEX is rarely ever huge, and if you need to read a copy, you need to read a copy. Its very rare the case that you’ll modify that copy, I guess, and send it back - but even in that case, a copy is appropriate.

I think you misunderstood - you don’t have access to that message, even not to read it, without a copy that would allocate.

For example, Logic sends out a Sysex message automatically when starting playback, and our plugin will allocate when iterating even though we have no interest in that message - we just need to check if it’s a note on or off, and that alone will allocate.

Ah, I suppose I did misunderstand. So whats wrong with using:

if (!m.isSysEx()) doSomething(m.getMessage())

… or, stranger still, ignoring any messages that aren’t for the specific channel of interest? This should work because SYSEX messages use MIDI Channel ‘0’ …

EDIT: thinking about it, I suppose the instance of ‘m’ has already done the allocation you’re trying to avoid by this point, so I’m afraid I’m out of answers … :wink:

Iterating over MidiBuffer gives you an object of type “MidiMessageMetadata” by copy.

That object itself is fine to copy, because it only holds three members:

struct MidiMessageMetadata final
{
    //Some member functions I removed....
    const uint8* data = nullptr;
    int numBytes = 0;
    int samplePosition = 0;
};

However, the only way to read that message, even just to know the type, is to call:

m.getMessage()

Which, if it’s a Sysex message, will call malloc. You have no way of checking the type of message before doing that. You could check on the number of bytes, but most MIDI code I’ve seen, including deep in the JUCE framework like Synthesiser, will not check it, and only branch to check the type, allocating in the process.

Okay, but what about ignoring any messages that have MIDI Channel set to 0, as that is reserved for SYSEX? Can you not just filter them out (i.e. not receive them intentionally) prior to the “for (auto m: buffer)”?

use numBytes to filter sysex ? (or more generally message which allocates)

I could do that in my code, for example filter out all the sysex message by getting rid of every message with more than 3 bytes before any additional processing is done.

But - first of all that means that every JUCE plugin that uses any kind of MIDI, or Synthesiser, MPESynth, etc, is allocating without knowing, so basically every single JUCE user would need to implement it.

Also, that won’t scale so well to code that actually needs Sysex messages, which is how I ran into that problem.

For example, my code could look like:

void process(MidiBuffer& midi)
{
    processMidiNotes(midi);
    processSysex(midi);
    processSynth(midi);
}

Filtering out the Sysex messages would be very hard in this case, as I would have to carefully create separate buffers for each midi processing function, just to avoid that extra allocation while making sure I still handle the Sysex messages correctly.

And even in those cases, I’d be forced to allocate when reading the Sysex messages.

I’m not sure why you wouldn’t just send SYSEX messages to one MidiBuffer and all other channelized (non channel=0) messages to another MidiBuffer, and only process the MidiBuffers you actually want to process, when you want to process them?

Because it’s not possible currently to iterate over the buffer and copy just the sysex messages without allocating. It’s just not possible in the current API.
Also as I mentioned, the JUCE code like Synthesiser doesn’t do that filtering, and most JUCE code out in the wild probably doesn’t as well, so they’re allocating in the framework and not in their own code which doesn’t even know about sysex.

Sorry, I guess I wasn’t clear - the point is not to iterate over MidiBuffer and weed out the SYSEX messages after the fact, but rather to not put SYSEX messages in your intended MidiBuffer in the first place - in your components MIDI handler method, when a MIDIMessage is received, simply don’t add the message to your MIDIBuffer if the event channel == 0 (i.e. its SYSEX) … or if you need to differentiate, and still receive SYSEX, then have a MIDIBuffer specifically for SYSEX and another for all other channelized data.

All this sounds like trying to work around a rather fundamental bug in JUCE.

One of the first things I learned from this forum:

Thou shalt not allocate on the audio thread.

How on earth are you supposed to adhere to that when the framework itself is doing it when dealing with MIDI?

6 Likes

That would solve the situation when I don’t need Sysex (but again, only in my code) but won’t solve the situation when I do need Sysex, as even iterating over those messages and reading them will allocate.

So in the case you’ve filtered your SYSEX messages to a MIDIBuffer containing MIDIMessages that are only of SYSEX type, you’d want a way to reference the existing SYSEX structures in the MidiMessage without forcing a copy?

Isn’t this what “const uint8* getSysexData()” is for?

Sorry for my dumb input - I’m no expert, but I do foresee that I would want to solve this problem as well, in near-future development, so I’m really only trying to participate in order to formulate a similar strategy, and its a little difficult to understand the actual problem - you’re trying to avoid doing two malloc()'s for each SYSEX message, right - the one that is necessary when the message arrives over the wire, in order to actually store the data, and the second which occurs when you try to access the object by copy, through an iterator somehow?

1 Like

Yes, but the only way to get to that function ATM is by copy.

Here’s how you’d do it:

//The for loop creates copies of a small stack object with pointers, no problem.
for (auto m: buffer)
{
    if (m.numBytes <= 3)
    {
        //Small object, no allocation, no problem
        handleRegularMessage(m.getMessage());
    }
    else
    {
        //Oh oh... allocates
        auto data = m.getMessage().getSysexData();
    }
}

The problem is the constructor for MidiMessage that’s called by m.getMessage() will call malloc if you have a larger message.

If you’ll read into the code of that constructor in MidiMessage, it will call allocateSpace, which will call std::malloc if the size is too large.

2 Likes

Yep there should be a non owning MidiMessage

2 Likes

yea, The framework should provide a ‘view’. i.e. a pointer to the data, and a size. There’s no need to copy the MIDI data.

2 Likes

Unless I’m misunderstanding something here, in the end, m.getMessage().getSysexData() essentially just gets you back to what you could grab (without allocating) from m.data.

I say “essentially” because getSysExData() won’t have the SysEx header and end bytes, so it’s two bytes fewer.

So maybe do this to handle SysEx without allocating?

if (m.numBytes > 3)
{
	auto* data = m.data;
	auto  dataLength = m.numBytes;
}

Or, to make it compatible with SysEx parsing code that’s expecting the result of getSysexData (which has the first and last bytes trimmed off):

if (m.numBytes > 3)
{
	auto* data = m.data + 1;
	auto  dataLength = m.numBytes - 1;
}

I haven’t tested that, but I think that’s the pointer math you’d need.

Well yes, I could use the raw data member of MidiMessageMetaData to parse the contents myself and forget that getMessage() exists.

But then inside that branch I won’t be able to use any of the MidiMessage functions, including even something basic as m.getMessage().isNoteOnOrOff(); of m.getMessage().isSysex(), so I essentially need to rewrite the entire MidiMessage class and every existing function I had that could potentially call the allocating path and modify it so it won’t call it.

You could say one solution is to make sure all JUCE users + all usages of MidiBuffer in the framework would explicitly add similar branching on the size and parse the Sysex data themsevles to avoid the constructor getting called.

However a solution I would prefer like @JeffMcClintock and @otristan suggested is to make sure in such situations MidiMessage itself just won’t allocate without all existing users having to explicitly change their code to be aware of the possible bug.

1 Like

I get that, and I like using those nice MidiMessage methods too – but at the same time, if you siphon off the SysEx messages into their own branch (by examining just MidiMessageMetadata), there’s really nothing useful in the MidiMessage class for parsing SysEx messages. Calling isNoteOnOrOff() or isSysex() has no use if you’ve already determined you’re dealing solely with SysEx messages. So it doesn’t feel like a huge loss to forgo that convenience from a coding standpoint, since when you’re dealing with SysEx you wind up crawling bytes with a raw pointer anyways.

And then in the branch that doesn’t handle SysEx, you can call MidiMessageMetadata::getMessage() and make use of all the MidiMessage functions.

As for what guarantees of non-allocating behavior the framework should make, that’s another question altogether… one I’m glad I don’t have to answer.