VSTPluginInstance::getProgramName issue


#1

Hi.

The function returned corrupted string for the string “Teccnö Fullrange Arp”

After removing the const. CharPointer_UTF8() it worked OK, but then I got the assertion:
jassert (t == nullptr || CharPointer_ASCII::isValidString (t, std::numeric_limits::max()));

const String VSTPluginInstance::getProgramName (int index)
{
    if (index == getCurrentProgram())
    {
        return getCurrentProgramName();
    }
    else if (effect != nullptr)
    {
        char nm [256] = { 0 };

        if (dispatch (effGetProgramNameIndexed,
                      jlimit (0, getNumPrograms(), index),
                      -1, nm, 0) != 0)
        {
               return String (CharPointer_UTF8 (nm)).trim();
        }
    }

    return programNames [index];
}

So I guess it’s not OK to convert it to ascii just like that.
How can I know how to correctly convert it?

Thank


#2

AFAIK you can’t - there’s no way to know what encoding the plugin has used for the string. Sensible ones will use utf-8, or stick to ascii and avoid the problem, but if they choose to send some other locale encoding, there’s no way to know about that.


#3

I thought about converting it first to UTF8 and If I get a corrupted string then try to covert it to ascii.
Is there a known way for validating strings? even if it’s not 100% working?

Thanks


#4

Yes, if you know what encoding has been used. But in this case, we don’t.


#5

There’s so much stuff on this issue:

(http://www.w3.org/International/questions/qa-forms-utf-8)

Also FireFox provide a UTF-8 detector which can be used as a standalone library:
http://www-archive.mozilla.org/projects/intl/detectorsrc.html

Edit: “The problem is that UTF-8 without a BOM is all too often indistinguishable from equally valid ANSI encoding. I think most solutions (like the win32 API IsTextUnicode) use various heuristics to give a best guess to the format of the text.”…
http://stackoverflow.com/questions/1231899/check-if-a-char-buffer-contains-utf8-characters


#6

Here’s the fix. Jules, please feel free to add it to the code base!

Background: VST 2.4 uses Mac Roman encoding on the Mac and CP-1252 (Windows Latin) on Windows. The solution is to convert these to and from Juce Strings. The implementation below does the converion in one direction only, but the other way around is very similar (you can probably just use the same system functions). The best place to put this code is probably in class String, although for the sake of not hackiong the Juce code base, I had to put it elsewhere as a static:

[code]const String stringFromPlatformEncodedString (const char* sourceString)
{
if (sourceString == 0 || strlen (sourceString) == 0)
return String::empty;

#if JUCE_MAC

    UniChar buffer[1024] = {0};        
    CFStringRef cs = CFStringCreateWithCString (kCFAllocatorDefault, sourceString, kCFStringEncodingMacRoman);
    CFRange range;
    range.location = 0;
    range.length = jmin (1024, (int)CFStringGetLength (cs));
    CFStringGetCharacters (cs, range, buffer);
    const String result (CharPointer_UTF16 ((int16*)&buffer));
    CFRelease (cs);
    return result;

#elif JUCE_WINDOWS

    // 65001 is utf-8, 1252 is Windows Latin1.
    const int codePage = 1252;

    int sourceLength = strlen (sourceString);
    int requiredSize = MultiByteToWideChar (codePage, 0, sourceString, sourceLength, 0, 0);	
    if (requiredSize == 0)
        return String::empty;
    
    HeapBlock <wchar_t> buffer;
    buffer.calloc (requiredSize + 1);
    buffer[requiredSize] = 0;
    
    int result = MultiByteToWideChar (codePage, 0, sourceString, sourceLength, buffer, requiredSize);
    if (result == 0)
        return String::empty;
    else
        return String (CharPointer_UTF16 (buffer.getData()));

#else
    // Other platforms ?
    return String (platformEncodedString);
#endif    

}[/code]

As a consequence, all VST-specific string API must be changed to use the conversion: juce_VST_Wrapper.cpp: getProgramName(), getProgramNameIndexed(), etc, are all using UTF8, which is wrong. Also need to change juce_VSTPluginFormat.cpp: VSTPluginInstance::getProgramName(), VSTPluginInstance::getCurrentProgramName(), VSTPluginInstance::getParameterName(), VSTPluginInstance::getParameterLabel(), … you get the idea :smiley:

The change looks similar in all functions:


#7

Does it? Is that official?


#8

It is “official” only to the extent that someone from Steinberg mentioned it on the VST developer list. It is not documented though (like many other things with VST that are not documented, so this does not worry me).

At least it is 100% sure that it’s not UTF8. I had assert failures and subsequent misbehavior over and over with Reaktor 5 and other VST plugins. NI are using MacRoman/CP1252 for sure, because the above fix reliably does the trick. Preset names like “Sägezahn” or “Hüllkurve” regularily crashed on me without the fix.

So, if Native Instrument rely on it, it should be safe to also do. At least it is a more safe default assumption than UTF8.

Being able to convert to/from the platforms default non-UTF8 encoding also is a nice enhancement to the String class in any case.


#9

Well… I only ask because there’s a lot of juce plugins out there that will be reporting their names in utf-8.


#10

Developers of existing Juce plugins should check if a VST 2.4 host handles their international preset names correctly. I doubt this. To test you just need to hack the demo plugin to return something like “Überdrüßig” for its program name and you’re done :wink: Perhaps some hosts are smart enough to test against different encodings?

If I remember correctly, VST 3 uses UTF8 or UTF16.