Emitting "strings to translate" from a Juce app?


#1

Hello, Juce-ys. I have an interesting C+±y design question relating to Juce.

I’m working on internationalizing our application, which has been going very well. I would like to extract a list of strings that need to be translated from it.

About half the strings are embedded in the code, and about half of them occur as part of structures within external data files.

One solution would be grepping and rewriting the grep results to get the strings out of the code, and then writing a data extractor for the data files.

Writing a completely reliable data extractor is easy, but it’s another moving part - writing the completely reliable grep/rewrite is tricky, as some of my strings are broken up into multiple source lines (I could fix that but suddenly I’d have a few reallylonglines when I so far have managed to keep everything tall and thin…), and some of my strings contain quote characters and some are in comments…

What I’m doing is to intercept all calls to translate() and store them, then emitting them to a file at shutdown, which I think is much neater.

Easy and effective, except that any message in a codepath that’s never run is not stored to the file. There are rather a lot of code paths, and a lot of these are errors which are somewhat difficult to arrange to have happen. I particularly don’t want the unpleasant workload of “making a few text changes/spending 15 minutes exercising every code path”.

So, of course, I want to do this automatically. Unfortunately, the only solution I can come up with uses global variables of class type, which I know is a minefield. I would naturally avoid this, but I can’t see another way to do it. On the other hand, I’m fairly familiar with the traps regarding global variables of class type, and I don’t believe that my approach will encounter them…

Here’s the code…

[code]class TranslatedString {
public:
TranslatedString(const char* o, bool translateNow = false) : original_(o) {
if (translateNow)
translate();
else
STRINGS.push_back(this);
}

operator const String&() const {
return *translated_;
}

static void translateAll() {
for (uint i = 0; i < STRINGS.size(); ++i)
STRINGS[i]->translate();
}

private:
void translate() {
translated_.reset(new String(trans(original_)));
}

typedef std::vector<TranslatedString*> StringList;

static StringList STRINGS;

const char* const original_;
ptr translated_;

DISALLOW_COPY_ASSIGN_AND_LEAKS(TranslatedString);
};
[/code]

The idea is that your strings are either static instances of TranslatedString, which are translated when your start-up routine calls TranslatedString::translateAll() - or they’re dynamic instances where translateNow is true, and the translation is called at construction time.

There is no possibility of dependencies between TranslatedStrings, and there are no other static variables of class type, so there won’t be other static classes depending on TranslatedString.

There isn’t a mutex corresponding to STRINGS, because all the additions occur in a single main thread, and then the translations occur at one spot, absolutely after that. (I could put in a mutex if it came down to it, it’s not important…)

But it still worries me. What do you think? Is there a better way?


#2

All right, I realize that there’s a serious issue with this.

These variables will be dynamically initialized, which means there’s no guarantee that they get initialized until something in that compilation unit gets called.

Other ideas?


#3

I’m not sure if it helps, but I once had to take a large existing Mac project that had strings hardcoded everywhere. The Mac approach is ‘use a macro’. NSLocalizedString (or some such). Basically, NSLocalize (“string in original language”, “comment”). Once you use that, you can run a util to extract all macro strings and put them in a file. The original string becomes the key for the lookup.

I simply counted on " almost always travelling in pairs (just a few special cases, like escaped in a string). First pass with a simple parser, pull the string, put it in quotes in a file, put the filename and line next to it. This let me find any weird cases (parsed wrong, shouldn’t be loclized, etc.) Next, I added a comment field to quite a few lines in my output file, and removed any special cases. Last pass, I my tool walked my file and inserted the Apple macro in each file location. If I hadn’t provided a comment, it used the filename (which is handier than ‘no comment provided by engineer’, the default from Apple’s genstrings tool).

I started with grep, etc., but gave up and just wrote a simple tool.

Good Luck!


#4

Very interesting strategy! Though - I’m not quite sure how that macro would work? There has to be some executable code there - which means there has to be a static variable - but how can you guarantee that it’s actually constructed? My research and experiments into this indicated that variables of static class type are not guaranteed to be constructed if you never execute any code in that compilation unit.

I actually finished this part of the job last night - with a hack, but a sustainable hack. I got rid of most of TranslatedString, particularly the static array of them - it’s just the main string, and then a translated string, and a method “translate”.

Each class or compilation unit that has TranslatedStrings that aren’t guaranteed to be called by starting up the program and then quitting has a static method or function called translateAll - and I have a single global function called translateAll() that calls all the other translateAll() methods.

So it’s all done explicitly. But it wasn’t that much work, there’s no static class instances, and it’s clear what I have to do.

Now I have to translate the strings (yes, in this case the programmer is the translator too - makes life easier). And then the moment I change any string, I need to write a tool to make it clear what’s changed and what hasn’t. Thank Goodness for Python for such tasks.