New String problem with high-ascii values


This produces the wrong result:

String s("÷2");

I believe this is the problem:


String::String (const char* const t)
    : text (StringHolder::createFromCharPointer (CharPointer_UTF8 (t)))

const char* const t is not really a utf-8 encoded string, its an ASCII string.

I think the solution is to just introduce CharPointer_ASCII, with its obvious implementation: each ASCII character maps 1:1 to a UTF-32 code point of the same value, and change the constructor for String(const char* const t).

I’m not 100% sure about all of this though, so don’t hesitate to correct me.


It’s only in ascii because your source-code editor saved the file as ascii. If it had saved it as utf-8, it’d work just fine.

Sadly, there’s only one portable way to encode strings with characters above 0x7f, and that’s by using escaped utf-8 character codes inside char* literals. There’s no other way to reliably get your string from the editor into the compiler and then into the code without risking the encoding being lost somewhere along the way. That’s why I made the String class assume it’s getting utf-8 - the alternative would be to assume that it’s ascii or a local encoding, and they’re worse options.

I’ve already been thinking about adding a CharPointer_ASCII class, and my cunning plan is to make the String (const char*) constructor slightly special - it’d assume the string it’s getting is unambiguously ascii, so that if you tried to feed it a value above 0x7f, it’d throw an assertion. That would force you to use a different constructor for extended strings, so in your case you’d have to write String (CharPointer_ASCII (“÷2”)). That would mean that it’s the coder’s responsibility to explicitly wrap these strings in an encoding that matches their source-file format.


I tried changing the source file with the high-ascii constant to save as “Unicode (UTF-8 with signature) - Codepage 65001” and it didn’t help…

Not sure what to do here.


Be careful adding the BOM to source files, as apparently gcc doesn’t handle it correctly.

The only “correct” thing to do would be to escape it as utf-8, which will at least guarantee it’ll work everywhere.


This worked:

b->getFacade().setTextLabel (TRANS("\xC3\xB7" "2"));