Encoding issues with getProgramName()

ans · May 27, 2011, 2:52pm

With NI Reaktor 5, I seem to have issues with getProgramName() sent to the plugin from my Juce host. The preset in question is “SoundSchool Analog” that contains a couple program names with umlauts (e.g. “Sägezahn”). I use the following code to compile a list of available programs:

void MyHostSomething::getProgramListAndSelection (MemoryBlock& destData)
{
    ScopedPointer <XmlElement> xml (new XmlElement ("PROGRAM-LIST"));
    AudioPluginInstance* inst = getPluginInstance();
    if (inst != NULL)
    {
        int current = inst->getCurrentProgram();
        int n = inst->getNumPrograms();
        if (n > 0)
        {
            for (int i = 0; i < n; i++)
            {
                XmlElement* prog = xml->createNewChildElement ("PROGRAM");
                prog->setAttribute ("number", i);
                prog->setAttribute ("name", inst->getProgramName (i));
                if (i == current)
                    prog->setAttribute ("selected", 1);
            }
        }
    }
    MemoryOutputStream stream (destData, false);
    xml->writeToStream (stream, "", false, true, "UTF-8", 80);    
}

The resulting XML delivers

for the “Sägezahn” program, which crashes the process that reads the UTF-8 XML, because 4194186 is way out of the valid range (and has nothing to do with “ä” actually).

What am I missing? Could it be the getProgramName() function is not safe for international characters? Or do I need to do some encoding magic to cope with this?

Any hint is appreciated.
TIA

P.S: I should note that this happens with the VST version only, AudioUnit works fine!
P.P.S: Happens with other VST too when I create a program with umlauts in the name.

jules · May 27, 2011, 3:14pm

The VST SDK docs don’t actually specify what character encoding is used for passing strings, so my code just assumes that it’s UTF-8. I guess that if they use some other local encoding, that could be misinterpreted and end up as your big unicode number.

Can’t really fix the fact that the encoding is undefined, but I’d at least like to avoid it crashing! Where exactly does it blow up?

ans · May 27, 2011, 3:42pm

The crash happens in another process that reads the XML - nothing to do with Juce.

I would guess the encoding is some Windows 8-bit or Latin 1. For a test, I changed this function in juce_VSTPluginFormat.cpp to use createStringFromData() and it works a little better, although still not correct. The umlauts convert to line feeds and other low-value ASCII codes, effectively truncating the program name.

[code]const String VSTPluginInstance::getProgramName (int index)
{
if (index == getCurrentProgram())
{
return getCurrentProgramName();
}
else if (effect != nullptr)
{
char nm [256] = { 0 };

    if (dispatch (effGetProgramNameIndexed,
                  jlimit (0, getNumPrograms(), index),
                  -1, nm, 0) != 0)
    {
        return String::createStringFromData (nm, strlen (nm)).trim();
    }
}

return programNames [index];

}
[/code]

Is there any way to convert from Latin 1?

ans · May 27, 2011, 3:59pm

Incredible, even more so as VST originates from Germany where Umlauts are used everywhere.

ans · May 27, 2011, 5:30pm

Found this after googling a bit:
http://audacity.sourcearchive.com/lines/1.3.10-2/VSTEffect_8cpp-source.html

A code snippet of Audacity indicates that Latin 1 is a good guess (at least less dangerous than assuming UTF-8):

wxString VSTEffect::GetString(int opcode, int index) { char buf[256]; buf[0] = '\0'; callDispatcher(opcode, index, 0, buf, 0.0); return LAT1CTOWX(buf); }

Will google a bit more if I can find a simple C++ implementation that converts from Latin 1 to UTF-8.

ans · May 27, 2011, 8:23pm

Ha! Just discovered that NI Reaktor 5 uses Mac Roman encoding on a Mac and probably CP-1252/iso-8859-1/Latin1 on Windows. IIRC, these are (or formerly were) considered the default platform encodings. No idea whether Native Instruments do have more detailed VST SDK docs than us, but them being a market leader, we can safely assume they are not doing this by pure chance.

I can’t seem to find a simple conversion for Mac Roman, but I managed to hack something for Latin1 -> UTF-8:

[code]
const String stringFromLatin1 (const unsigned char* source)
{
MemoryOutputStream out;
unsigned char ch;

while ((ch = *source) != 0)
{
    if (ch < 0x80) 
    {
        out.write (&ch, 1);    
    } else {
        if (ch < 0xC0)
        {
            unsigned char temp = 0xC2; 
            out.write (&temp, 1);
            out.write (&ch, 1); 
        } else {
            unsigned char temp = 0xC3;
            out.write (&temp, 1);
            temp -= 0x40;
            out.write (&temp, 1);
        }
    }
    source++;
}
return out.toUTF8();

}[/code]

Or even more simple:

const String stringFromLatin1 (const unsigned char* source)
{
    MemoryOutputStream out;
    unsigned char ch;
    
    while ((ch = *source) != 0)
    {
        if (ch < 0x80) 
        {
            out.write (&ch, 1);    
        } else {
            unsigned char temp = 0xc0 | (ch & 0xc0) >> 6; 
            out.write (&temp, 1);
            temp = 0x80 | (ch & 0x3f);
            out.write (&temp, 1);            
        }
        source++;
    }
    return out.toUTF8();
}

Undoubtedly this could be more nicely integrated with the char pointer classes, but I don’t have the time to wrap my head around this right now.

ans · May 27, 2011, 10:43pm

Ok, here’s a solution for Mac Roman. It works fine for the bespoke NI plugins. I don’t actually like the static buffer size, but it is safe in the VST scope.

const String stringFromMacRoman (const char* source)
{
    char buffer[256];
    CFStringRef temp = CFStringCreateWithCString (kCFAllocatorDefault, source, kCFStringEncodingMacRoman);
    CFStringGetCString (temp, buffer, 256, kCFStringEncodingUTF8);
    CFRelease (temp);
    CharPointer_UTF8 utf8 (buffer);
    return String (utf8);
}

Using that in VSTPluginInstance::getProgramName() and at other places a string is returned by a VST does the trick. Under Windows, one of the above converters can be used.

Should be fairly easy to put both into a member function [b]VSTPluginInstance::convertVstString (char

vstString)[/b] depending on JUCE_MAC etc. I will do some more tests with other plugins. If it proves, is there any chance this will get integrated?

ans · May 28, 2011, 9:24am

Now I did some tests with other plugins and got mixed results. As the VST SDK does not specify the encoding, it seems that every developer is brewing their own soup, depending on their own string and cross-platform libraries, or something. Most plugins can not deal with international characters in a predictable way, some even refuse to accept them for preset names altogether (NI Akoustik Piano does so, which is a good thing regarding the mess it would cause otherwise).

The conclusion is, while the above hacks work fine for some plugins, we can not come up with a solution for all of them. We rather need to avoid damage and minimize confusion, should any plugin report names with international characters. It is probably best to ignore them (leaving a gap) or substitute by a harmless placeholder (a question mark, space, or dot perhaps).

It is also very unlikely that any plugin uses international characters for parameters and other internal symbols, so this method would only apply to program names.

In any case, VST names can not be assumed to be UTF-8. The conversion routine should simply strip/susbstitute all characters above 127. This will also work on all platforms and be very simple.

What do you think?

jules · May 28, 2011, 9:38am

It is indeed a mess! I guess that just stripping characters above 127 is probably the best that could be done - it’ll produce a few weird looking strings, but will at least not cause any damage.

ans · May 30, 2011, 10:05pm

FYI: Just got an “official” response from Steinberg. While VST 3 is based on UTF-16, there is no unified string encoding in VST 2.x. As we already found out, it depends on platform and code page (Windows: CP-1252, Mac: Central European encoding, probably Mac Roman, although I’m not 100% sure it’s the same).

Topic		Replies	Views
VSTPluginInstance::getProgramName issue Audio Plugins	9	718	December 1, 2011
Steinberg VST3 validator -- "ProgramList has no name" Audio Plugins	2	566	September 22, 2021
AU GetCurrentProgram() problem General JUCE discussion	8	945	March 25, 2017
[solved] Bug in getProgramName() in juce_VSTPluginFormat.cpp? Audio Plugins	2	284	December 2, 2021
String assertion in VSTPluginInstance Audio Plugins	1	447	August 3, 2011

Encoding issues with getProgramName()

Purchase

Discover

Learn

Support

About

Events

Encoding issues with getProgramName()

Related Topics

Purchase

Discover

Learn

Support

About

Events