Beginner encoding question


#1

I’m hoping someone can explain what is going on here, and/or maybe point me to a summary of the encoding issues involved…

I have a stock textEditor object, and it’s not displaying some characters correctly. These are problems that windows Notepad does not have, when set to the same font and displaying the same text Here’s a screenshot:

.

Any tips or links welcome. Encoding is all very new to me.


#2

There are some differences in encoding of characters between Windows and Mac OS X. They must be addressed by the JUCE developer in his code since any software can be multi-platform.

The solution is to use a tool available in the Projucer / Introjucer where you can copy and paste some text, to get the instruction needed to include this text in your code, in a format working for every platform, with the internal JUCE text encoding system. It’s in the menu “Tools, UTF8 String Literal Converter”.


#3

Thanks, I will look into that…

But it sounds from your description like this is just for string literals in your own code? (good to know nevertheless…)

My code is for opening up existing text files on the user’s computer… the example I gave was a simple copy/paste off the web into Notepad, which I opened with my Juce app. I’m just wondering what Notepad (and firefox for that matter) does that I’m not doing.

I guess I do still see escape characters all over the web, including a surprising number on this site(!) so I don’t suppose there is any proof against it. But if Notepad can handle something, I’d think Juce could be made to.


#4

Oh sorry, didn’t got what you meant at first…

Well, let’s say JUCE is not really handling all the text files encoding formats. I remember a few years ago I needed to be able to read text files made in the ANSI format with french characters, and I had to code a custom function for that (beware, it’s some code made 5-6 years ago or something)

[code]String convertANSIToUnicode(String strANSI)
{
String strTemp = “”;
int i = 0;

const char* new_cstring = static_cast<const char*> (strANSI.toUTF8());

StringArray strArrayTable;
for(int i=0; i<256; i++)
    strArrayTable.add("");
      

// Transformation tables
strArrayTable.set((unsigned char) 'à', CharPointer_UTF8 ("\xc3\xa0"));
strArrayTable.set((unsigned char) 'â', CharPointer_UTF8 ("\xc3\xa2"));
strArrayTable.set((unsigned char) 'ä', CharPointer_UTF8 ("\xc3\xa4"));
strArrayTable.set((unsigned char) 'ç', CharPointer_UTF8 ("\xc3\xa7"));
strArrayTable.set((unsigned char) 'è', CharPointer_UTF8 ("\xc3\xa8"));
strArrayTable.set((unsigned char) 'é', CharPointer_UTF8 ("\xc3\xa9"));
strArrayTable.set((unsigned char) 'ê', CharPointer_UTF8 ("\xc3\xaa"));
strArrayTable.set((unsigned char) 'ë', CharPointer_UTF8 ("\xc3\xab"));
strArrayTable.set((unsigned char) 'î', CharPointer_UTF8 ("\xc3\xae"));
strArrayTable.set((unsigned char) 'ï', CharPointer_UTF8 ("\xc3\xaf"));
strArrayTable.set((unsigned char) 'ô', CharPointer_UTF8 ("\xc3\xb4"));
strArrayTable.set((unsigned char) 'ö', CharPointer_UTF8 ("\xc3\xb6"));
strArrayTable.set((unsigned char) 'ù', CharPointer_UTF8 ("\xc3\xb9"));
strArrayTable.set((unsigned char) 'û', CharPointer_UTF8 ("\xc3\xbb"));
strArrayTable.set((unsigned char) 'ü', CharPointer_UTF8 ("\xc3\xbc"));

strArrayTable.set((unsigned char) 'À', CharPointer_UTF8 ("\xc3\x80"));
strArrayTable.set((unsigned char) 'Â', CharPointer_UTF8 ("\xc3\x82"));
strArrayTable.set((unsigned char) 'Ä', CharPointer_UTF8 ("\xc3\x84"));
strArrayTable.set((unsigned char) 'Ç', CharPointer_UTF8 ("\xc3\x87"));
strArrayTable.set((unsigned char) 'È', CharPointer_UTF8 ("\xc3\x88"));
strArrayTable.set((unsigned char) 'É', CharPointer_UTF8 ("\xc3\x89"));
strArrayTable.set((unsigned char) 'Ê', CharPointer_UTF8 ("\xc3\x8a"));
strArrayTable.set((unsigned char) 'Ë', CharPointer_UTF8 ("\xc3\x8b"));
strArrayTable.set((unsigned char) 'Î', CharPointer_UTF8 ("\xc3\x8e"));
strArrayTable.set((unsigned char) 'Ï', CharPointer_UTF8 ("\xc3\x8f"));
strArrayTable.set((unsigned char) 'Ô', CharPointer_UTF8 ("\xc3\x94"));
strArrayTable.set((unsigned char) 'Ö', CharPointer_UTF8 ("\xc3\x96"));
strArrayTable.set((unsigned char) 'Ù', CharPointer_UTF8 ("\xc3\x99"));
strArrayTable.set((unsigned char) 'Û', CharPointer_UTF8 ("\xc3\x9b"));
strArrayTable.set((unsigned char) 'Ü', CharPointer_UTF8 ("\xc3\x9c"));

while(new_cstring[i] != '\0')
{    
    if (strArrayTable[(unsigned char) new_cstring[i]] != "")
        strTemp = strTemp + strArrayTable[(unsigned char) new_cstring[i]];
    else
        strTemp = strTemp + new_cstring[i];

    i++;
}

return strTemp;

}
[/code]

And you use it like that :

File fileToRead; String strContenu = convertANSIToUnicode(fileToRead.loadFileAsString());

I can’t say you won’t have problems with this ugly code, but it might do the job for you. If you need other characters not provided already, you might do the same thing I did, and got the associated special code using the String Literals tool in the Projucer.

Hope that helps


#5

Well, it is ugly, but I will copy it into my notes, in case I don’t discover a more elegant solution.

Thanks!
e