Quirky text formatting when calling toUpperCase on translated string

I’m translating my English app to Spanish. If I call:

TRANS("some string who's resulting translation includes accents or tildes").toUpperCase();

the accented/tilded letters stay in lower case. Here’s an example with a tilded “n”:

24%20PM

My current workaround is to just manually type the Spanish string in as uppercase in the translation file. But sometimes I use a string in more than one place in the app, one in lowercase, the other in uppercase, so that workaround isn’t always feasible.

Is there a way around this?

For now I’ve added some manual replacements to the end of toUpperCase():

String String::toUpperCase() const
{
    StringCreationHelper builder (text);

    for (;;)
    {
        auto c = builder.source.toUpperCase();
        builder.write (c);

        if (c == 0)
            break;

        ++(builder.source);
    }
    
    String str = static_cast<String&&> (builder.result);
    
    str = str.replace("ñ", "Ñ");
    str = str.replace("á", "Á");
    str = str.replace("é", "É");
    str = str.replace("í", "Í");
    str = str.replace("ó", "Ó");
    str = str.replace("ú", "Ú");
    
    return str;
}

if there isn’t a very good reason to dynamically change the case for purposes of translation it is beneficial to use upper case string literals where ever upper case text should appear. a lot of languages have edge cases. frequently quoted ones are e.g. in german the letter ß would be SS in upper case or the letter i in turkish becomes İ.

Fair enough

FWIW we just call the std library’s upper/lower case conversion, so not really much we can do about that if it’s not performing as expected.

Which means, that the C-locale is used:
http://en.cppreference.com/w/cpp/string/byte/toupper

Would it be possible to add a String function to select an appropriate locale?

Why would we need to add anything to String for that? Could be done just by calling the appropriate std functions in your own code, I think?

True indeed, there is a workaround. My thinking was more so you don’t have to mix juce::String and std::String back and forth. But one can live without juce::String, if that’s the way to go :wink:

Sorry for the necro, but I am struggling with this exact same problem right now.

In my plugin GUI, there’s a form where users enter their name and license key for validation. To avoid unnecessary support issues where users would not use upper/lowercase consistently when typing their own names (which turns out to be surprisingly common!), I’m converting the names to uppercase when doing the comparison.

This works great on macOS even for names with international characters, but not so great on Windows, as described in the thread above.

So I’d like to know if there is a way to reliably convert user provided strings to uppercase on Windows? I tried converting the JUCE String to std::string, but that didn’t seem to make a difference. Hopefully I’m overlooking something obvious?

OK, so I’m not the only one. :laughing:

Perhaps a stupid question, but is there a reason you need to uppercase everything and not lowercase? I personally force everything to lowercase and never had a problem.

Seems like you could “solve” the problem by just avoiding uppercase, assuming you have the choice.

I just assumed that they were equivalent and I randomly selected upper case. You’re saying that conerting to lowercase is safer, e g will it always convert Á to á? I’ll have to test this.

Unfortunately the uppercase names are also used cryptographically as part of the license validation process, so I don’t think i can change it to lowercase now or all the existing licenses will break. :frowning:

I haven’t actually tested this, and could well come and bite me later, but so far not been an issue in practice.

OK, yeah I hadn’t considered that, quite a valid reason. I changed my licenses and informed customers, but still occasionally get support request for someone who missed the memo and trying to use their legacy license in the new versions. But overall less of those than I got with people unable to follow the instructions to enter the details exactly as written in the receipt email. :joy:

The problem of string normalisation and case folding is Very Difficult. There’s a good overview here:

It looks like, in order to convert a string into a form that can be compared with another string that differs in case/representation, you’d need to do a case fold, then normalise, then case fold and normalise again.

Unfortunately, I don’t think JUCE currently has case-fold or normalisation functions. If the comparison has to work across platforms, that might introduce further complications because you would need to ensure that case folding with a particular locale consistently produces the same result on each platform.

In the case of macOS, I think you’d need CFStringFold and CFStringNormalize. On Windows, it looks like FoldString does both folding and normalisation.

2 Likes

Thanks a lot for your replies, I really appreciate it!

Yes, this really appears to be a jungle I’ve gotten myself into, there are a couple of decisions I really regret now. :slight_smile:

I think I might go for the manual replacement workaround as described in the second comment above, then! It might not be the prettiest way to go about it, but hopefully it should solve at least 90% of the name issues we’re seeing.

1 Like

I use utf8proc library for uppercase/lowercase/comparisons with different national signs - it works well on all 4 platforms.