Decoding a UTF8 String

dave96 · June 25, 2012, 8:40pm

I have a number of UTF8 encoded strings in an xml database which I then load into a ValueTree, so far so good. But how do I then display these strings as they should be? String::fromUTF8 only seems to work for string literals.
E.g.

[code]DBG (String::fromUTF8 (CharPointer_UTF8 ("\xc9\x99\xca\x8a"))); // displays as expected əʊ

const char* string = valueTree.getChild (index).getProperty (Ids::name).toString().toUTF8().getAddress();
DBG (string); // only displays "\xc9\x99\xca\x8a"
DBG (String::fromUTF8 (string)); // only displays “\xc9\x99\xca\x8a”
[/code]

jules · June 26, 2012, 7:01am

huh…? Why are you messing about casting it to a char*? Why not just:

DBG (valueTree.getChild (index).getProperty (Ids::name).toString());

??

dave96 · June 26, 2012, 11:56am

That doesn’t work either. I was casting to a char* as that seemed the closest thing to a string literal.

This is easily reproducible, or maybe I’m just approaching it in the wrong way. Simply store the string “\xc9\x99\xca\x8a” in a file called “testText.txt” on the desktop and run the code below. None of the options give the correct characters, only the string as it is stored in the file.

[code] const File testFile (File::getSpecialLocation (File::userDesktopDirectory).getChildFile (“testText.txt”));
DBG (testFile.loadFileAsString());
DBG (String (testFile.loadFileAsString().toUTF8()));
DBG (testFile.loadFileAsString().toUTF8().getAddress());
DBG (String (CharPointer_UTF8 (testFile.loadFileAsString().toUTF8().getAddress())));
DBG (String::fromUTF8 (CharPointer_UTF8 (testFile.loadFileAsString().toUTF8().getAddress())));

MemoryBlock testBlock;
testFile.loadFileAsData (testBlock);
DBG (testBlock.toString());
DBG (String::fromUTF8 ((char*) testBlock.getData()));

DBG (String::fromUTF8 ("\xc9\x99\xca\x8a"));                 // this however does work

[/code]

I think my lack of understanding of how unicode is stored in memory/files is lacking here but as you can see I’ve tried a lot of combinations.

jules · June 26, 2012, 12:25pm

Silly question, but you’re not literally typing “\xc9\x99\xca\x8a” into a text editor as if you’re writing a c++ string literal, are you?

dave96 · June 27, 2012, 12:02am

:oops: Now I feel silly, of course that isn’t going to work.

That isn’t what I was doing originally however. I had an Xml document with non-ascii characters that I added to the Introjucer, this very nicely converted it into UTF8 string literals for me. The problem was that it wouldn’t parse as Xml, I kept getting an “unmatched pairs” error but I’m pretty sure the Xml is good, it loads in Chrome fine.

Anyway, I’ve found a suitable work around for now, might come back to it at a later data.

Topic		Replies	Views
Strange in UTF8 String General JUCE discussion	1	420	June 22, 2013
Tracking down a UTF8 issue General JUCE discussion	7	1856	February 8, 2014
Reading Chinese String from file General JUCE discussion	5	2287	August 19, 2013
Strangeness with UTF8! General JUCE discussion	3	471	August 20, 2012
How to correctly use UTF8 with JUCE strings General JUCE discussion	1	664	May 1, 2013

Decoding a UTF8 String

Purchase

Discover

Learn

Support

About

Events

Decoding a UTF8 String

Related topics

Purchase

Discover

Learn

Support

About

Events