Create Strings from data of any other encoding then UTF-8

sionescu · May 19, 2011, 10:00pm

Hello,
I am first of all sorry if my question was already answered in another topic, I did try to look for it but could not find a quick solution.

We have (and unfortunately for legacy and uniformity reasons can’t change this) a lot do literals that are in French and have accents, and the accents are handled using their ISO-8859-1 code (I am not saying this is a good practice, UTF8 would have been better, but it is a change we would rather not face now), such as: String(“Dur\xe9\x65:”)= Durée. Until I updated the version of Juce with the modifications to String, this was not a problem (I am not sure how come it was working, but it was, maybe due to my locale). However after I did it of course did not work anymore as the character larger than 127 caused and assert and was encoded in UTF-8.

I know I can easily solve the problem by doing String(CharPointer_UTF8("Dur\xc3\xa9"“e:”)), however, as I mentioned above, I would like to keep the uniformity of the code and also to prevent changing all the string literals (there are a quite few). We are in the process of writing our own conversion methods to get around this, but I was wondering, if there was another way to use any other encoding in the Strings? Or at least to build the strings from another encoding? (In particular Latin-1)

Thank you very much for any suggstions,
Sorin

jules · May 20, 2011, 8:03am

The problem is that if your source file has a local encoding, there’s no standard way to tell the compiler about it…

If I was in your situation, I think I’d probably spend a few minutes writing a quick conversion tool that would scan your source-files, looking for quoted strings, and convert any non-ascii chars in there from your local encoding to escaped utf-8 sequences. You could run it on your tree to convert all your existing strings, and whenever you add any new strings, you can just paste them in as french, and then run your conversion utility again.

sionescu · May 20, 2011, 12:26pm

Thank you very much for the quick reply, I wanted to make sure I didn’t miss anything before doing something like that, thanks again,

Sorin

X-Ryl669 · May 24, 2011, 3:02pm

You know Notepad++ right ?

sionescu · May 24, 2011, 3:10pm

The problem was not my editor; it was the fact that all strings in my code based were purposefully encoded in Latin-1 using escape sequences followed by the ISO-8859-1 code. I worked around the problem using boost locale library and their handy conversion method to_utf. But thanks.

X-Ryl669 · May 24, 2011, 3:32pm

Ok. I was too quick to answer. In Notepad++, there are multiple plugins, and one of them is doing charset encoding conversion.
Open your directory, it’ll open all the files.
You’d have to record a macro doing “\xe9” to “é”, “\xe8” to “è” …etc…, while the current file encoding was set as UTF-8, and then a simple find and replace with regular expression
("[^"]+")
to CharPointer_UTF8(\1)
Replay the macro for all your files, and save them.
You’re done.

Topic		Replies	Views
Unicode conversion General JUCE discussion	2	308	May 12, 2017
New String problem with high-ascii values General JUCE discussion	4	728	January 30, 2011
Beginner encoding question General JUCE discussion	4	642	April 22, 2016
LocalisedStrings and Chinese characters General JUCE discussion	4	1154	January 23, 2014
Using unicode in Juce General JUCE discussion	6	1524	September 28, 2009

Create Strings from data of any other encoding then UTF-8

Purchase

Discover

Learn

Support

About

Events

Create Strings from data of any other encoding then UTF-8

Related topics

Purchase

Discover

Learn

Support

About

Events