On customizing and internationalizing an application

I’m now turning my attention to internationalizing my application and framework, and customizing it for various other clients.

EDIT: A first question?

I realized after I wrote this that I have a simple question I don’t see how to solve - how do I know what language a user’s system is set to?

Translation by string keys considered harmful?

Unfortunately, there’s a fundamental issue with Juce’s internationalization scheme which is making it hard to achieve my targets.

If I’m reading the documentation correctly, you tag string that are embedded into your code with a macro called TRANS that marks this text as needing translation into other languages, and then you provide a dictionary to look up these other languages from the original strings.

Unfortunately, this won’t work for anything other than fairly small applications.

The big issue is the assumption that there is a 1:1 correspondence between your small sentence fragments in one language and another. But in fact that’s simply not the case. English turns out to be particularly bad as a root language to translate from this way because it doesn’t have gender or case and fairly often a noun and a similar verb are spelled exactly the same - so you get text that’s grammatical in English but not in the target language.

This problem is exacerbated if you are trying to customize code with your internationalization framework. Now, if you did your framework right, this should be trivial - you create a new special language for each new client you have and perform a translation - but give up on a one-to-one translation from one to the other!

Or consider what happens if you change the wording of text in your code slightly? Suddenly your whole translation no longer works. But perhaps it’s only a tiny change, and you simply don’t want to change the translation? Or, you do want to change the translation sooner or later, but you want to put this change into production this week, and it takes several weeks to get translations done?

Even worse from my perspective, it puts the engineer right in the middle of the workflow. Do you need to change some text in a window somewhere? Call the engineer and he can change the code and rebuild it for you!

How it should be done.

I’ve been involved with this before in organizations, even a very large organization, and there’s only really one way to do it as far as I know.

Each string or string fragment that the user sees needs to have a unique named integer token. An engineer creates a new string to be translated or customized in code by creating a new token, and attaching it in his code to a “placeholder” string - a value that will be displayed if everything else fail (in other words, the value the engineer sees in the early parts of development). The token ID is what is permanent - all translations and customizations are keyed off that.

Either “translating” or “customizing” the code can be done by simply “handing the file of tokens to a translator or a business guy”. Because it’s all data-driven, it’s really easy to edit the labels “in-place” by actually double-clicking on form elements and changing them in the application itself (when you run the app in “editor” mode, of course… :-D)

The big advantage is that the engineer is totally out of the daily workflow loop - they just create the strings and token. and no further work is expected of them, they can later change the “placeholder” name and nothing goes wrong - the one thing you can’t do is change the tokens once created but that’s a standard sort of engineering constraint. Someone else in the business office can customize this for a company, or someone in Indonesia can translate this and generate their own file and the engineer doesn’t even know that this has happened!

How could Juce be changed to do this?

The code would change very little, and I believe that the old and new code could coexist during the transition.

  // Old mechanism.
  Label text1(TRANS("times"));  // As in "seven times seven", or "a list of times" or "how many times"?

  Label text1(TRANS2("times", MULTIPLICATION_BUTTON_LABEL));  // In French: "fois"
  Label text2(TRANS2("times", ARRIVAL_TIME_CAPTION));  // In French: "heure"
  Label text3(TRANS2("times", REPETITION_COUNT_LABEL));  // In French: "répétitions"


Yes, you’re quite right, assuming a 1-to-1 mapping isn’t a workable approach.

But… the reason I’ve never changed this is that you can use the current system in quite a neat way to do the job. Why use integers? Why not just use unique strings as your key? Because if you use strings, it already does everything you need - so in your example:

Label text1(TRANS("MULTIPLICATION_BUTTON_LABEL")); // In French: "fois" Label text2(TRANS("ARRIVAL_TIME_CAPTION")); // In French: "heure" Label text3(TRANS("REPETITION_COUNT_LABEL")); // In French: "répétitions"

The only downside is that for those strings where you’re using a token rather than some english text, your translation files must be correct, otherwise the default string will look a bit confusing when shown in the UI… (But in fact, I could easily add a new method that takes both a string to translate, and a default string, and that would avoid problems like that).

I like that idea! (two strings to a function)

Oh, I love the string token idea, why not?

Sure, there’s no advantage to integers, and if you felt they had to be integers for whatever procedural reasons, then you could use the string representation of integers! If you “could easily add a new method that takes both a string to translate, and a default string” that would be fantastic and solve my representational problem at one swoop.

Already checked in!

I’ve added some translate() functions that do the same thing as the TRANS macro, so you don’t have to use macros. And one of them takes a second string parameter as its default value.

Truly excellent…

[quote=“jules”]Label text1(TRANS("MULTIPLICATION_BUTTON_LABEL")); // In French: "fois" Label text2(TRANS("ARRIVAL_TIME_CAPTION")); // In French: "heure" Label text3(TRANS("REPETITION_COUNT_LABEL")); // In French: "répétitions"[/quote]

The problem with this approach is that the “large projects” for which the original solution was unworkable (and rightly so), will now have the problem of the possibility of namespace collisions - presumably there would be so many key strings that the chances of different developers using the same value would be high. For example “BUTTON_LABEL” in two different .cpp files.

What about ?

// File based translation
#define TR(X) TRANS(String( __FILE__ ) + X)
// Namespace or class based translation, so it's looking for "juce::MyComponentBUTTON_LABEL"
#ifdef _MSCVER
#define TR(X) TRANS(String(__FUNCSIG__).upToFirstOccurrenceOf("(", false, false).fromFirstOccurrenceOf(" ", false, false).upToLastOccurenceOf(":", false, false) + X)
#define TR(X) TRANS(String(__PRETTY_FUNCTION__).upToFirstOccurrenceOf("(", false, false).fromFirstOccurrenceOf(" ", false, false).upToLastOccurenceOf(":", false, false) + X)

Yes, but there’s only one translation file, so that would contain duplicate keys. And it’d be very easy to scan that file to detect dupes when it’s loaded.

That’s a really good point, and it would be easy to find them in the sources, true.