VSTPluginInstance::getProgramName issue

Shlomi · November 22, 2011, 10:34am

Hi.

The function returned corrupted string for the string “Teccnö Fullrange Arp”

After removing the const. CharPointer_UTF8() it worked OK, but then I got the assertion:
jassert (t == nullptr || CharPointer_ASCII::isValidString (t, std::numeric_limits::max()));

const String VSTPluginInstance::getProgramName (int index)
{
    if (index == getCurrentProgram())
    {
        return getCurrentProgramName();
    }
    else if (effect != nullptr)
    {
        char nm [256] = { 0 };

        if (dispatch (effGetProgramNameIndexed,
                      jlimit (0, getNumPrograms(), index),
                      -1, nm, 0) != 0)
        {
               return String (CharPointer_UTF8 (nm)).trim();
        }
    }

    return programNames [index];
}

So I guess it’s not OK to convert it to ascii just like that.
How can I know how to correctly convert it?

Thank

jules · November 22, 2011, 11:11am

AFAIK you can’t - there’s no way to know what encoding the plugin has used for the string. Sensible ones will use utf-8, or stick to ascii and avoid the problem, but if they choose to send some other locale encoding, there’s no way to know about that.

Shlomi · November 22, 2011, 11:26am

I thought about converting it first to UTF8 and If I get a corrupted string then try to covert it to ascii.
Is there a known way for validating strings? even if it’s not 100% working?

Thanks

jules · November 22, 2011, 12:11pm

Yes, if you know what encoding has been used. But in this case, we don’t.

Shlomi · November 22, 2011, 12:14pm

There’s so much stuff on this issue:

it may be a good idea for the script that receives the form data to check that the data returned indeed uses UTF-8 (in case something went wrong, e.g. the user changed the encoding). Checking is possible because UTF-8 has a very specific byte-pattern not seen in any other encoding. If non-UTF-8 data is received, an error message should be sent back.

As an example, in Perl, a regular expression testing for UTF-8 may look as follows:

$field =~
m/\A(
[\x09\x0A\x0D\x20-\x7E] # ASCII
| [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte
| \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs
| [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte
| \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates
| \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3
| [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15
| \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16
)*\z/x;

This expression can be adapted to other programming languages. It takes care of various issues, such as illegal overlong encodings and illegal use of surrogates. It will return true if $field is UTF-8, and false otherwise.

(http://www.w3.org/International/questions/qa-forms-utf-8)

Also FireFox provide a UTF-8 detector which can be used as a standalone library:
http://www-archive.mozilla.org/projects/intl/detectorsrc.html

Edit: “The problem is that UTF-8 without a BOM is all too often indistinguishable from equally valid ANSI encoding. I think most solutions (like the win32 API IsTextUnicode) use various heuristics to give a best guess to the format of the text.”…
http://stackoverflow.com/questions/1231899/check-if-a-char-buffer-contains-utf8-characters

ans · November 30, 2011, 9:57pm

Here’s the fix. Jules, please feel free to add it to the code base!

Background: VST 2.4 uses Mac Roman encoding on the Mac and CP-1252 (Windows Latin) on Windows. The solution is to convert these to and from Juce Strings. The implementation below does the converion in one direction only, but the other way around is very similar (you can probably just use the same system functions). The best place to put this code is probably in class String, although for the sake of not hackiong the Juce code base, I had to put it elsewhere as a static:

[code]const String stringFromPlatformEncodedString (const char* sourceString)
{
if (sourceString == 0 || strlen (sourceString) == 0)
return String::empty;

#if JUCE_MAC

    UniChar buffer[1024] = {0};        
    CFStringRef cs = CFStringCreateWithCString (kCFAllocatorDefault, sourceString, kCFStringEncodingMacRoman);
    CFRange range;
    range.location = 0;
    range.length = jmin (1024, (int)CFStringGetLength (cs));
    CFStringGetCharacters (cs, range, buffer);
    const String result (CharPointer_UTF16 ((int16*)&buffer));
    CFRelease (cs);
    return result;

#elif JUCE_WINDOWS

    // 65001 is utf-8, 1252 is Windows Latin1.
    const int codePage = 1252;

    int sourceLength = strlen (sourceString);
    int requiredSize = MultiByteToWideChar (codePage, 0, sourceString, sourceLength, 0, 0);	
    if (requiredSize == 0)
        return String::empty;
    
    HeapBlock <wchar_t> buffer;
    buffer.calloc (requiredSize + 1);
    buffer[requiredSize] = 0;
    
    int result = MultiByteToWideChar (codePage, 0, sourceString, sourceLength, buffer, requiredSize);
    if (result == 0)
        return String::empty;
    else
        return String (CharPointer_UTF16 (buffer.getData()));

#else
    // Other platforms ?
    return String (platformEncodedString);
#endif

}[/code]

As a consequence, all VST-specific string API must be changed to use the conversion: juce_VST_Wrapper.cpp: getProgramName(), getProgramNameIndexed(), etc, are all using UTF8, which is wrong. Also need to change juce_VSTPluginFormat.cpp: VSTPluginInstance::getProgramName(), VSTPluginInstance::getCurrentProgramName(), VSTPluginInstance::getParameterName(), VSTPluginInstance::getParameterLabel(), … you get the idea

The change looks similar in all functions:

jules · December 1, 2011, 10:05am

Does it? Is that official?

ans · December 1, 2011, 12:39pm

It is “official” only to the extent that someone from Steinberg mentioned it on the VST developer list. It is not documented though (like many other things with VST that are not documented, so this does not worry me).

At least it is 100% sure that it’s not UTF8. I had assert failures and subsequent misbehavior over and over with Reaktor 5 and other VST plugins. NI are using MacRoman/CP1252 for sure, because the above fix reliably does the trick. Preset names like “Sägezahn” or “Hüllkurve” regularily crashed on me without the fix.

So, if Native Instrument rely on it, it should be safe to also do. At least it is a more safe default assumption than UTF8.

Being able to convert to/from the platforms default non-UTF8 encoding also is a nice enhancement to the String class in any case.

jules · December 1, 2011, 12:50pm

Well… I only ask because there’s a lot of juce plugins out there that will be reporting their names in utf-8.

ans · December 1, 2011, 1:32pm

Developers of existing Juce plugins should check if a VST 2.4 host handles their international preset names correctly. I doubt this. To test you just need to hack the demo plugin to return something like “Überdrüßig” for its program name and you’re done Perhaps some hosts are smart enough to test against different encodings?

If I remember correctly, VST 3 uses UTF8 or UTF16.

Topic		Replies	Views
Encoding issues with getProgramName() Audio Plugins	9	1025	May 30, 2011
String assertion in VSTPluginInstance Audio Plugins	1	491	August 3, 2011
Using CharPointer_UTF8 in juce_VSTPluginFormat.cpp Audio Plugins	2	497	April 15, 2012
VSTPluginInstance::fillInPluginDescription assertion while scanning plugin General JUCE discussion	2	480	August 15, 2014
AudioUnit Identifier string, osTypeToString General JUCE discussion	10	743	April 29, 2011

VSTPluginInstance::getProgramName issue

Purchase

Discover

Learn

Support

About

Events

VSTPluginInstance::getProgramName issue

Related topics

Purchase

Discover

Learn

Support

About

Events