Create Strings from data of any other encoding then UTF-8


#1

Hello,
I am first of all sorry if my question was already answered in another topic, I did try to look for it but could not find a quick solution.

We have (and unfortunately for legacy and uniformity reasons can’t change this) a lot do literals that are in French and have accents, and the accents are handled using their ISO-8859-1 code (I am not saying this is a good practice, UTF8 would have been better, but it is a change we would rather not face now), such as: String(“Dur\xe9\x65:”)= Durée. Until I updated the version of Juce with the modifications to String, this was not a problem (I am not sure how come it was working, but it was, maybe due to my locale). However after I did it of course did not work anymore as the character larger than 127 caused and assert and was encoded in UTF-8.

I know I can easily solve the problem by doing String(CharPointer_UTF8("Dur\xc3\xa9"“e:”)), however, as I mentioned above, I would like to keep the uniformity of the code and also to prevent changing all the string literals (there are a quite few). We are in the process of writing our own conversion methods to get around this, but I was wondering, if there was another way to use any other encoding in the Strings? Or at least to build the strings from another encoding? (In particular Latin-1)

Thank you very much for any suggstions,
Sorin


#2

The problem is that if your source file has a local encoding, there’s no standard way to tell the compiler about it…

If I was in your situation, I think I’d probably spend a few minutes writing a quick conversion tool that would scan your source-files, looking for quoted strings, and convert any non-ascii chars in there from your local encoding to escaped utf-8 sequences. You could run it on your tree to convert all your existing strings, and whenever you add any new strings, you can just paste them in as french, and then run your conversion utility again.


#3

Thank you very much for the quick reply, I wanted to make sure I didn’t miss anything before doing something like that, thanks again,

Sorin


#4

You know Notepad++ right ?


#5

The problem was not my editor; it was the fact that all strings in my code based were purposefully encoded in Latin-1 using escape sequences followed by the ISO-8859-1 code. I worked around the problem using boost locale library and their handy conversion method to_utf. But thanks.


#6

Ok. I was too quick to answer. In Notepad++, there are multiple plugins, and one of them is doing charset encoding conversion.
Open your directory, it’ll open all the files.
You’d have to record a macro doing “\xe9” to “é”, “\xe8” to “è” …etc…, while the current file encoding was set as UTF-8, and then a simple find and replace with regular expression
("[^"]+")
to CharPointer_UTF8(\1)
Replay the macro for all your files, and save them.
You’re done.