Encoding issues with getProgramName()


#1

With NI Reaktor 5, I seem to have issues with getProgramName() sent to the plugin from my Juce host. The preset in question is “SoundSchool Analog” that contains a couple program names with umlauts (e.g. “Sägezahn”). I use the following code to compile a list of available programs:

void MyHostSomething::getProgramListAndSelection (MemoryBlock& destData)
{
    ScopedPointer <XmlElement> xml (new XmlElement ("PROGRAM-LIST"));
    AudioPluginInstance* inst = getPluginInstance();
    if (inst != NULL)
    {
        int current = inst->getCurrentProgram();
        int n = inst->getNumPrograms();
        if (n > 0)
        {
            for (int i = 0; i < n; i++)
            {
                XmlElement* prog = xml->createNewChildElement ("PROGRAM");
                prog->setAttribute ("number", i);
                prog->setAttribute ("name", inst->getProgramName (i));
                if (i == current)
                    prog->setAttribute ("selected", 1);
            }
        }
    }
    MemoryOutputStream stream (destData, false);
    xml->writeToStream (stream, "", false, true, "UTF-8", 80);    
}

The resulting XML delivers

for the “Sägezahn” program, which crashes the process that reads the UTF-8 XML, because 4194186 is way out of the valid range (and has nothing to do with “ä” actually).

What am I missing? Could it be the getProgramName() function is not safe for international characters? Or do I need to do some encoding magic to cope with this?

Any hint is appreciated.
TIA

P.S: I should note that this happens with the VST version only, AudioUnit works fine!
P.P.S: Happens with other VST too when I create a program with umlauts in the name.


#2

The VST SDK docs don’t actually specify what character encoding is used for passing strings, so my code just assumes that it’s UTF-8. I guess that if they use some other local encoding, that could be misinterpreted and end up as your big unicode number.

Can’t really fix the fact that the encoding is undefined, but I’d at least like to avoid it crashing! Where exactly does it blow up?


#3

The crash happens in another process that reads the XML - nothing to do with Juce.

I would guess the encoding is some Windows 8-bit or Latin 1. For a test, I changed this function in juce_VSTPluginFormat.cpp to use createStringFromData() and it works a little better, although still not correct. The umlauts convert to line feeds and other low-value ASCII codes, effectively truncating the program name.

[code]const String VSTPluginInstance::getProgramName (int index)
{
if (index == getCurrentProgram())
{
return getCurrentProgramName();
}
else if (effect != nullptr)
{
char nm [256] = { 0 };

    if (dispatch (effGetProgramNameIndexed,
                  jlimit (0, getNumPrograms(), index),
                  -1, nm, 0) != 0)
    {
        return String::createStringFromData (nm, strlen (nm)).trim();
    }
}

return programNames [index];

}
[/code]

Is there any way to convert from Latin 1?


#4

Incredible, even more so as VST originates from Germany where Umlauts are used everywhere.


#5

Found this after googling a bit:
http://audacity.sourcearchive.com/lines/1.3.10-2/VSTEffect_8cpp-source.html

A code snippet of Audacity indicates that Latin 1 is a good guess (at least less dangerous than assuming UTF-8):

wxString VSTEffect::GetString(int opcode, int index) { char buf[256]; buf[0] = '\0'; callDispatcher(opcode, index, 0, buf, 0.0); return LAT1CTOWX(buf); }

Will google a bit more if I can find a simple C++ implementation that converts from Latin 1 to UTF-8.


#6

Ha! Just discovered that NI Reaktor 5 uses Mac Roman encoding on a Mac and probably CP-1252/iso-8859-1/Latin1 on Windows. IIRC, these are (or formerly were) considered the default platform encodings. No idea whether Native Instruments do have more detailed VST SDK docs than us, but them being a market leader, we can safely assume they are not doing this by pure chance.

I can’t seem to find a simple conversion for Mac Roman, but I managed to hack something for Latin1 -> UTF-8:

[code]
const String stringFromLatin1 (const unsigned char* source)
{
MemoryOutputStream out;
unsigned char ch;

while ((ch = *source) != 0)
{
    if (ch < 0x80) 
    {
        out.write (&ch, 1);    
    } else {
        if (ch < 0xC0)
        {
            unsigned char temp = 0xC2; 
            out.write (&temp, 1);
            out.write (&ch, 1); 
        } else {
            unsigned char temp = 0xC3;
            out.write (&temp, 1);
            temp -= 0x40;
            out.write (&temp, 1);
        }
    }
    source++;
}
return out.toUTF8();

}[/code]

Or even more simple:

const String stringFromLatin1 (const unsigned char* source)
{
    MemoryOutputStream out;
    unsigned char ch;
    
    while ((ch = *source) != 0)
    {
        if (ch < 0x80) 
        {
            out.write (&ch, 1);    
        } else {
            unsigned char temp = 0xc0 | (ch & 0xc0) >> 6; 
            out.write (&temp, 1);
            temp = 0x80 | (ch & 0x3f);
            out.write (&temp, 1);            
        }
        source++;
    }
    return out.toUTF8();
}

Undoubtedly this could be more nicely integrated with the char pointer classes, but I don’t have the time to wrap my head around this right now.


#7

Ok, here’s a solution for Mac Roman. :smiley: It works fine for the bespoke NI plugins. I don’t actually like the static buffer size, but it is safe in the VST scope.

const String stringFromMacRoman (const char* source)
{
    char buffer[256];
    CFStringRef temp = CFStringCreateWithCString (kCFAllocatorDefault, source, kCFStringEncodingMacRoman);
    CFStringGetCString (temp, buffer, 256, kCFStringEncodingUTF8);
    CFRelease (temp);
    CharPointer_UTF8 utf8 (buffer);
    return String (utf8);
}

Using that in VSTPluginInstance::getProgramName() and at other places a string is returned by a VST does the trick. Under Windows, one of the above converters can be used.

Should be fairly easy to put both into a member function [b]VSTPluginInstance::convertVstString (char

  • vstString)[/b] depending on JUCE_MAC etc. I will do some more tests with other plugins. If it proves, is there any chance this will get integrated?

#8

Now I did some tests with other plugins and got mixed results. As the VST SDK does not specify the encoding, it seems that every developer is brewing their own soup, depending on their own string and cross-platform libraries, or something. Most plugins can not deal with international characters in a predictable way, some even refuse to accept them for preset names altogether (NI Akoustik Piano does so, which is a good thing regarding the mess it would cause otherwise).

The conclusion is, while the above hacks work fine for some plugins, we can not come up with a solution for all of them. We rather need to avoid damage and minimize confusion, should any plugin report names with international characters. It is probably best to ignore them (leaving a gap) or substitute by a harmless placeholder (a question mark, space, or dot perhaps).

It is also very unlikely that any plugin uses international characters for parameters and other internal symbols, so this method would only apply to program names.

In any case, VST names can not be assumed to be UTF-8. The conversion routine should simply strip/susbstitute all characters above 127. This will also work on all platforms and be very simple.

What do you think?


#9

It is indeed a mess! I guess that just stripping characters above 127 is probably the best that could be done - it’ll produce a few weird looking strings, but will at least not cause any damage.


#10

FYI: Just got an “official” response from Steinberg. While VST 3 is based on UTF-16, there is no unified string encoding in VST 2.x. As we already found out, it depends on platform and code page (Windows: CP-1252, Mac: Central European encoding, probably Mac Roman, although I’m not 100% sure it’s the same).