Quirky text formatting when calling toUpperCase on translated string

hanley · March 20, 2018, 9:39pm

I’m translating my English app to Spanish. If I call:

TRANS("some string who's resulting translation includes accents or tildes").toUpperCase();

the accented/tilded letters stay in lower case. Here’s an example with a tilded “n”:

24%20PM

My current workaround is to just manually type the Spanish string in as uppercase in the translation file. But sometimes I use a string in more than one place in the app, one in lowercase, the other in uppercase, so that workaround isn’t always feasible.

Is there a way around this?

hanley · March 21, 2018, 6:44pm

For now I’ve added some manual replacements to the end of toUpperCase():

String String::toUpperCase() const
{
    StringCreationHelper builder (text);

    for (;;)
    {
        auto c = builder.source.toUpperCase();
        builder.write (c);

        if (c == 0)
            break;

        ++(builder.source);
    }
    
    String str = static_cast<String&&> (builder.result);
    
    str = str.replace("ñ", "Ñ");
    str = str.replace("á", "Á");
    str = str.replace("é", "É");
    str = str.replace("í", "Í");
    str = str.replace("ó", "Ó");
    str = str.replace("ú", "Ú");
    
    return str;
}

luzifer · March 21, 2018, 8:21pm

if there isn’t a very good reason to dynamically change the case for purposes of translation it is beneficial to use upper case string literals where ever upper case text should appear. a lot of languages have edge cases. frequently quoted ones are e.g. in german the letter ß would be SS in upper case or the letter i in turkish becomes İ.

hanley · March 22, 2018, 1:53pm

Fair enough

jules · March 22, 2018, 2:04pm

FWIW we just call the std library’s upper/lower case conversion, so not really much we can do about that if it’s not performing as expected.

daniel · March 22, 2018, 3:13pm

Which means, that the C-locale is used:
http://en.cppreference.com/w/cpp/string/byte/toupper

Would it be possible to add a String function to select an appropriate locale?

jules · March 22, 2018, 3:14pm

Why would we need to add anything to String for that? Could be done just by calling the appropriate std functions in your own code, I think?

daniel · March 22, 2018, 3:17pm

True indeed, there is a workaround. My thinking was more so you don’t have to mix juce::String and std::String back and forth. But one can live without juce::String, if that’s the way to go

nbacklund · April 21, 2022, 7:59am

Sorry for the necro, but I am struggling with this exact same problem right now.

In my plugin GUI, there’s a form where users enter their name and license key for validation. To avoid unnecessary support issues where users would not use upper/lowercase consistently when typing their own names (which turns out to be surprisingly common!), I’m converting the names to uppercase when doing the comparison.

This works great on macOS even for names with international characters, but not so great on Windows, as described in the thread above.

So I’d like to know if there is a way to reliably convert user provided strings to uppercase on Windows? I tried converting the JUCE String to std::string, but that didn’t seem to make a difference. Hopefully I’m overlooking something obvious?

asimilon · April 21, 2022, 11:39am

OK, so I’m not the only one.

Perhaps a stupid question, but is there a reason you need to uppercase everything and not lowercase? I personally force everything to lowercase and never had a problem.

Seems like you could “solve” the problem by just avoiding uppercase, assuming you have the choice.

nbacklund · April 21, 2022, 11:42am

I just assumed that they were equivalent and I randomly selected upper case. You’re saying that conerting to lowercase is safer, e g will it always convert Á to á? I’ll have to test this.

Unfortunately the uppercase names are also used cryptographically as part of the license validation process, so I don’t think i can change it to lowercase now or all the existing licenses will break.

asimilon · April 21, 2022, 11:48am

I haven’t actually tested this, and could well come and bite me later, but so far not been an issue in practice.

OK, yeah I hadn’t considered that, quite a valid reason. I changed my licenses and informed customers, but still occasionally get support request for someone who missed the memo and trying to use their legacy license in the new versions. But overall less of those than I got with people unable to follow the instructions to enter the details exactly as written in the receipt email.

reuk · April 21, 2022, 1:26pm

The problem of string normalisation and case folding is Very Difficult. There’s a good overview here:

It looks like, in order to convert a string into a form that can be compared with another string that differs in case/representation, you’d need to do a case fold, then normalise, then case fold and normalise again.

Unfortunately, I don’t think JUCE currently has case-fold or normalisation functions. If the comparison has to work across platforms, that might introduce further complications because you would need to ensure that case folding with a particular locale consistently produces the same result on each platform.

In the case of macOS, I think you’d need CFStringFold and CFStringNormalize. On Windows, it looks like FoldString does both folding and normalisation.

nbacklund · April 22, 2022, 9:19am

Thanks a lot for your replies, I really appreciate it!

Yes, this really appears to be a jungle I’ve gotten myself into, there are a couple of decisions I really regret now.

I think I might go for the manual replacement workaround as described in the second comment above, then! It might not be the prettiest way to go about it, but hopefully it should solve at least 90% of the name issues we’re seeing.

MBO · April 22, 2022, 1:42pm

I use utf8proc library for uppercase/lowercase/comparisons with different national signs - it works well on all 4 platforms.

Topic		Replies	Views
BR: String::toUpperCase is wrong	8	420	December 9, 2022
String toTitleCase Globalization General JUCE discussion	2	535	May 31, 2015
The String class doesn't work correctly with Russian General JUCE discussion	12	2817	November 14, 2022
First letter of juce::String to upper case? General JUCE discussion	2	539	April 19, 2022
String::replaceCharacters crash with accents Development	3	491	July 5, 2019

Quirky text formatting when calling toUpperCase on translated string

Purchase

Discover

Learn

Support

About

Events

Quirky text formatting when calling toUpperCase on translated string

Related Topics

Purchase

Discover

Learn

Support

About

Events