I don’t think there’s any encoding requirement in the MIDI spec, and from time to time I receive the ASCII assertion inside the getTextFromTextMetaEvent()
call.
Hmm… Ok, I guess that since we have to make some kind of blind assumption about the encoding, then assuming that it’s utf-8 seems like a sensible choice. Does anyone know better about the kind of encodings used in midi metadata?
In case anyone is wondering about this; I’ve been in touch with Tom White from the MMA (MIDI Manufacturers Assocation), and have been told that ASCII is not a required encoding format. Quote from my e-mail: [quote]There is no explicit requirement that all text in MIDI is always ASCII… it just happens that the majority of text fields defined in MIDI were defined as ASCII.
[…] If you look at the definitions of meta-events in the SMF spec (and addenda) all (except Lyrics, see above) either say the text encoding is ASCII or else don’t say anything about the text encoding… and in the latter case I think most developers will just assume it is ASCII. (Certainly RP-026 states that before that the “only character code defined for SMF was ASCII”). [/quote]
I’ve been linked [here] for additional information on lyrical metadata. There seems to be a specific setup from the MMA for Kareoke related data for different languages.