New String problem with high-ascii values

TheVinn · January 30, 2011, 2:51am

This produces the wrong result:

String s("÷2");

I believe this is the problem:

juce_String.cpp:

String::String (const char* const t)
    : text (StringHolder::createFromCharPointer (CharPointer_UTF8 (t)))
{
}

const char* const t is not really a utf-8 encoded string, its an ASCII string.

I think the solution is to just introduce CharPointer_ASCII, with its obvious implementation: each ASCII character maps 1:1 to a UTF-32 code point of the same value, and change the constructor for String(const char* const t).

I’m not 100% sure about all of this though, so don’t hesitate to correct me.

jules · January 30, 2011, 12:39pm

It’s only in ascii because your source-code editor saved the file as ascii. If it had saved it as utf-8, it’d work just fine.

Sadly, there’s only one portable way to encode strings with characters above 0x7f, and that’s by using escaped utf-8 character codes inside char* literals. There’s no other way to reliably get your string from the editor into the compiler and then into the code without risking the encoding being lost somewhere along the way. That’s why I made the String class assume it’s getting utf-8 - the alternative would be to assume that it’s ascii or a local encoding, and they’re worse options.

I’ve already been thinking about adding a CharPointer_ASCII class, and my cunning plan is to make the String (const char*) constructor slightly special - it’d assume the string it’s getting is unambiguously ascii, so that if you tried to feed it a value above 0x7f, it’d throw an assertion. That would force you to use a different constructor for extended strings, so in your case you’d have to write String (CharPointer_ASCII (“÷2”)). That would mean that it’s the coder’s responsibility to explicitly wrap these strings in an encoding that matches their source-file format.

TheVinn · January 30, 2011, 12:46pm

I tried changing the source file with the high-ascii constant to save as “Unicode (UTF-8 with signature) - Codepage 65001” and it didn’t help…

Not sure what to do here.

jules · January 30, 2011, 12:54pm

Be careful adding the BOM to source files, as apparently gcc doesn’t handle it correctly.

The only “correct” thing to do would be to escape it as utf-8, which will at least guarantee it’ll work everywhere.

TheVinn · January 30, 2011, 1:04pm

This worked:

b->getFacade().setTextLabel (TRANS("\xC3\xB7" "2"));

Topic		Replies	Views
Create Strings from data of any other encoding then UTF-8 General JUCE discussion	5	891	May 24, 2011
Changes to String and 'good old ASCII' General JUCE discussion	8	1125	February 10, 2011
Tracking down a UTF8 issue General JUCE discussion	7	1861	February 8, 2014
String class crash General JUCE discussion	2	459	September 23, 2011
String::fromUTF8 stops at non-UTF ansi Umlaut General JUCE discussion	6	713	January 21, 2011

New String problem with high-ascii values

Purchase

Discover

Learn

Support

About

Events

New String problem with high-ascii values

Related topics

Purchase

Discover

Learn

Support

About

Events