Decoding a UTF8 String


I have a number of UTF8 encoded strings in an xml database which I then load into a ValueTree, so far so good. But how do I then display these strings as they should be? String::fromUTF8 only seems to work for string literals.

[code]DBG (String::fromUTF8 (CharPointer_UTF8 ("\xc9\x99\xca\x8a"))); // displays as expected əʊ

const char* string = valueTree.getChild (index).getProperty (Ids::name).toString().toUTF8().getAddress();
DBG (string); // only displays "\xc9\x99\xca\x8a"
DBG (String::fromUTF8 (string)); // only displays “\xc9\x99\xca\x8a”


huh…? Why are you messing about casting it to a char*? Why not just:



That doesn’t work either. I was casting to a char* as that seemed the closest thing to a string literal.

This is easily reproducible, or maybe I’m just approaching it in the wrong way. Simply store the string “\xc9\x99\xca\x8a” in a file called “testText.txt” on the desktop and run the code below. None of the options give the correct characters, only the string as it is stored in the file.

[code] const File testFile (File::getSpecialLocation (File::userDesktopDirectory).getChildFile (“testText.txt”));
DBG (testFile.loadFileAsString());
DBG (String (testFile.loadFileAsString().toUTF8()));
DBG (testFile.loadFileAsString().toUTF8().getAddress());
DBG (String (CharPointer_UTF8 (testFile.loadFileAsString().toUTF8().getAddress())));
DBG (String::fromUTF8 (CharPointer_UTF8 (testFile.loadFileAsString().toUTF8().getAddress())));

MemoryBlock testBlock;
testFile.loadFileAsData (testBlock);
DBG (testBlock.toString());
DBG (String::fromUTF8 ((char*) testBlock.getData()));

DBG (String::fromUTF8 ("\xc9\x99\xca\x8a"));                 // this however does work


I think my lack of understanding of how unicode is stored in memory/files is lacking here but as you can see I’ve tried a lot of combinations.


Silly question, but you’re not literally typing “\xc9\x99\xca\x8a” into a text editor as if you’re writing a c++ string literal, are you?


:oops: Now I feel silly, of course that isn’t going to work.

That isn’t what I was doing originally however. I had an Xml document with non-ascii characters that I added to the Introjucer, this very nicely converted it into UTF8 string literals for me. The problem was that it wouldn’t parse as Xml, I kept getting an “unmatched pairs” error but I’m pretty sure the Xml is good, it loads in Chrome fine.

Anyway, I’ve found a suitable work around for now, might come back to it at a later data.