Beginner encoding question

woodslanding · April 20, 2016, 7:36pm

I’m hoping someone can explain what is going on here, and/or maybe point me to a summary of the encoding issues involved…

I have a stock textEditor object, and it’s not displaying some characters correctly. These are problems that windows Notepad does not have, when set to the same font and displaying the same text Here’s a screenshot:

.

Any tips or links welcome. Encoding is all very new to me.

IvanC · April 21, 2016, 5:32am

There are some differences in encoding of characters between Windows and Mac OS X. They must be addressed by the JUCE developer in his code since any software can be multi-platform.

The solution is to use a tool available in the Projucer / Introjucer where you can copy and paste some text, to get the instruction needed to include this text in your code, in a format working for every platform, with the internal JUCE text encoding system. It’s in the menu “Tools, UTF8 String Literal Converter”.

woodslanding · April 21, 2016, 2:15pm

Thanks, I will look into that…

But it sounds from your description like this is just for string literals in your own code? (good to know nevertheless…)

My code is for opening up existing text files on the user’s computer… the example I gave was a simple copy/paste off the web into Notepad, which I opened with my Juce app. I’m just wondering what Notepad (and firefox for that matter) does that I’m not doing.

I guess I do still see escape characters all over the web, including a surprising number on this site(!) so I don’t suppose there is any proof against it. But if Notepad can handle something, I’d think Juce could be made to.

IvanC · April 21, 2016, 4:31pm

Oh sorry, didn’t got what you meant at first…

Well, let’s say JUCE is not really handling all the text files encoding formats. I remember a few years ago I needed to be able to read text files made in the ANSI format with french characters, and I had to code a custom function for that (beware, it’s some code made 5-6 years ago or something)

[code]String convertANSIToUnicode(String strANSI)
{
String strTemp = “”;
int i = 0;

const char* new_cstring = static_cast<const char*> (strANSI.toUTF8());

StringArray strArrayTable;
for(int i=0; i<256; i++)
    strArrayTable.add("");
      

// Transformation tables
strArrayTable.set((unsigned char) 'à', CharPointer_UTF8 ("\xc3\xa0"));
strArrayTable.set((unsigned char) 'â', CharPointer_UTF8 ("\xc3\xa2"));
strArrayTable.set((unsigned char) 'ä', CharPointer_UTF8 ("\xc3\xa4"));
strArrayTable.set((unsigned char) 'ç', CharPointer_UTF8 ("\xc3\xa7"));
strArrayTable.set((unsigned char) 'è', CharPointer_UTF8 ("\xc3\xa8"));
strArrayTable.set((unsigned char) 'é', CharPointer_UTF8 ("\xc3\xa9"));
strArrayTable.set((unsigned char) 'ê', CharPointer_UTF8 ("\xc3\xaa"));
strArrayTable.set((unsigned char) 'ë', CharPointer_UTF8 ("\xc3\xab"));
strArrayTable.set((unsigned char) 'î', CharPointer_UTF8 ("\xc3\xae"));
strArrayTable.set((unsigned char) 'ï', CharPointer_UTF8 ("\xc3\xaf"));
strArrayTable.set((unsigned char) 'ô', CharPointer_UTF8 ("\xc3\xb4"));
strArrayTable.set((unsigned char) 'ö', CharPointer_UTF8 ("\xc3\xb6"));
strArrayTable.set((unsigned char) 'ù', CharPointer_UTF8 ("\xc3\xb9"));
strArrayTable.set((unsigned char) 'û', CharPointer_UTF8 ("\xc3\xbb"));
strArrayTable.set((unsigned char) 'ü', CharPointer_UTF8 ("\xc3\xbc"));

strArrayTable.set((unsigned char) 'À', CharPointer_UTF8 ("\xc3\x80"));
strArrayTable.set((unsigned char) 'Â', CharPointer_UTF8 ("\xc3\x82"));
strArrayTable.set((unsigned char) 'Ä', CharPointer_UTF8 ("\xc3\x84"));
strArrayTable.set((unsigned char) 'Ç', CharPointer_UTF8 ("\xc3\x87"));
strArrayTable.set((unsigned char) 'È', CharPointer_UTF8 ("\xc3\x88"));
strArrayTable.set((unsigned char) 'É', CharPointer_UTF8 ("\xc3\x89"));
strArrayTable.set((unsigned char) 'Ê', CharPointer_UTF8 ("\xc3\x8a"));
strArrayTable.set((unsigned char) 'Ë', CharPointer_UTF8 ("\xc3\x8b"));
strArrayTable.set((unsigned char) 'Î', CharPointer_UTF8 ("\xc3\x8e"));
strArrayTable.set((unsigned char) 'Ï', CharPointer_UTF8 ("\xc3\x8f"));
strArrayTable.set((unsigned char) 'Ô', CharPointer_UTF8 ("\xc3\x94"));
strArrayTable.set((unsigned char) 'Ö', CharPointer_UTF8 ("\xc3\x96"));
strArrayTable.set((unsigned char) 'Ù', CharPointer_UTF8 ("\xc3\x99"));
strArrayTable.set((unsigned char) 'Û', CharPointer_UTF8 ("\xc3\x9b"));
strArrayTable.set((unsigned char) 'Ü', CharPointer_UTF8 ("\xc3\x9c"));

while(new_cstring[i] != '\0')
{    
    if (strArrayTable[(unsigned char) new_cstring[i]] != "")
        strTemp = strTemp + strArrayTable[(unsigned char) new_cstring[i]];
    else
        strTemp = strTemp + new_cstring[i];

    i++;
}

return strTemp;

}
[/code]

And you use it like that :

File fileToRead; String strContenu = convertANSIToUnicode(fileToRead.loadFileAsString());

I can’t say you won’t have problems with this ugly code, but it might do the job for you. If you need other characters not provided already, you might do the same thing I did, and got the associated special code using the String Literals tool in the Projucer.

Hope that helps

woodslanding · April 22, 2016, 4:26am

Well, it is ugly, but I will copy it into my notes, in case I don’t discover a more elegant solution.

Thanks!
e

Topic		Replies	Views
Cannot read CJK text from file and convert to String correctly on Windows Windows	3	876	January 24, 2017
TextEditor input encoding Windows	1	87	October 1, 2025
Unicode conversion General JUCE discussion	1	323	March 1, 2007
Create Strings from data of any other encoding then UTF-8 General JUCE discussion	5	934	May 24, 2011
Characters with accents displayed wrong General JUCE discussion	3	603	December 5, 2008

Beginner encoding question

Purchase

Discover

Learn

Support

About

Events

Beginner encoding question

Related topics

Purchase

Discover

Learn

Support

About

Events