Embedding unicode string literals in your cpp files

jules · March 26, 2014, 1:16pm

This topic seems to come up again and again, so I thought I'd make a sticky post here to avoid having to repeatedly explain it...

We regularly get people saying "The fonts are broken, my Chinese/Japanese/etc text won't display correctly!"

And the most common reason for this has nothing to do with fonts or graphics, it's because people have written code like this:

String textToDisplay = "一些文字";

The code above is going to screw up the encoding in at least some situations, depending on the compiler and editor that are involved. And no, it can't be magically fixed by sticking an L in front of the literal.

The String class is expecting UTF-8 characters, but compilers have no idea what type of encoding your text editor was using when you saved the source-file, and they'll make an assumption which is generally going to be wrong. So most likely, the encoding is going to get garbled somewhere between your editor, the compiler, and the library classes. The ONLY cross-platform way to embed a unicode string into C++ source code is by dumbing it down to ASCII + escape characters. That's a pain to write by hand, but luckily if you fire up the Introjucer and use its "UTF-8 String Literal Helper" tool, it'll do all the messy stuff for you, and convert any unicode string into a safe C++ expression that you can paste into your code, e.g.

String textToDisplay = CharPointer_UTF8 ("\xe4\xb8\x80\xe4\xba\x9b\xe6\x96\x87\xe5\xad\x97");

Alan · March 17, 2015, 4:06pm

Chinese users had problems displaying their devices in a AudioDeviceSelectorComponent object (squares instead of plain characters).

I fixed the problem by adding the following lines in the MainContentComponent constructor:

#if JUCE_MAC || JUCE_WINDOWS

    getLookAndFeel().setDefaultSansSerifTypefaceName("Arial Unicode MS");

#endif

Although this is a specific implementation and won't be sufficient in all cases (Jules you might want to fix this in the modules regarding the supplied AudioDeviceSelectorComponent class), the interesting point is that the "Arial Unicode MS" font seems to be compatible with both Latin and Chinese characters at once, both on Win and Mac. I thought I would share this and hope it can help some of you..

otristan · March 17, 2015, 4:27pm

A better fix is to use TextLayout in the LnF of Combo and PopupMenu which will find fallback fonts when glyphes are not available in the current one.

Alan · March 17, 2015, 4:56pm

Thank you for this. This might be the way to go for the AudioDeviceSelectorComponent problem (Jules to decide).

As I said my main point was that the Arial Unicode MS seems to work and is a simple way for cases where there is no code to find default fonts that would work.

jimc · April 3, 2015, 10:25am

Shoot me down if I'm wrong - but I believe this does the right thing in C++11 and is a little less awkward:

const String fontAwesomeFolder = String::fromUTF8(u8"\uf114");

The bible says: "A string literal that begins with u8, such as u8"asdf", is a UTF-8 string literal and is initialized with the given characters as encoded in UTF-8."

I think this means that the source will (obviously) be in the basic character set, and the string will only ever be a valid UTF8 encoding, which I think makes it the same as the example using CharPointer_UT8. Presumably the alternative below does the same thing but I've not tested it:

const String fontAwesomeFolder = CharPointer_UTF8(u8"\uf114");

(Update: not supported in Windows with CTP Nov 2013 compiler - looks like it's in VS 2015 preview though).

Gerald · April 3, 2015, 10:14pm

Hi,

is this working on unicode geometric shapes too ? because I tried and the result was only a square and not the geometric shape i wanted...

jimc · April 5, 2015, 5:07pm

Does your font definitely have the geometric glyph you want? I found on Windows that the system font was missing a lot of what I thought was obvious stuff...

Gerald · April 5, 2015, 9:10pm

Actually I'm adding these lines


#if JUCE_MAC || JUCE_WINDOWS getLookAndFeel().setDefaultSansSerifTypefaceName("Arial Unicode MS"); #endif

and the geometric shape appears...but I don't know how to change the font size of a textbutton text, because I am using this geometric shape in a button and don't know how to make it look bigger.Maybe it can be done using LookAndFeel, but I am not very familiar with it as I am new in juce.

timur · May 7, 2015, 5:25pm

Hi there.

Try the following.

Create a custom LookAndFeel class for your button. You can do this by deriving from one of the default LookAndFeels and overriding the function that defines the font for the button text:

class MyLookAndFeel : public LookAndFeel_V3
{
    Font getTextButtonFont (TextButton&, int buttonHeight) override
    {
        return Font ("Arial Unicode MS", 20.0f, Font::plain);
    }
};

Then, create an instance of this class and pass it to the button by calling the buttons's setLookAndFeel method. That should do the trick.

Gerald · May 8, 2015, 12:46pm

Thanks…it does the trick.

roeland-2 · July 6, 2015, 6:14am

Visual Studio 2015 RC has support for u8"blabla" literals, see https://msdn.microsoft.com/en-us/library/69ze775t%28v=vs.140%29.aspx , but that solves only half the problem.

The compiler also has to know the encoding of the source file. As far as I know there's no compiler option to specify the encoding of the source file. MSVC will assume it is the current code page of the system, based on your system-wide language settings. In other words, it depends on what computer you compile the source file on.

Unless you save the source files as either UTF-16, or as “UTF-8 with byte order mark^(†)”. In those 2 cases the encoding is detected correctly.

So if you're able to consistently make sure files are saved in that encoding and all the other compilers you use support that, then maybe you can write u8"☺☺☺". Otherwise it's still at least u8"\u2639\u2639\u2639"

^(†) The bytes [ef bb bf], the UTF-8 encoding of U+FEFF, are often prepended to an UTF-8 text file as a magic number to tell applications the file is encoded in UTF-8 and not in whatever code page your system is using. However some programs (eg. PHP) will misintrepret that as the file starting with U+FEFF or ï»¿ or whatever.

ivan · August 22, 2015, 12:06pm

Hello,

On windows 7 the code:

String degreesymbol = String::fromUTF8("\u00B0");

Font font("Arial Unicode MS", 24, Font::bold);

g.DrawTest(degreesymbol , 4, 0, width - 4, height, Justification::centredLeft, true);

Does not show the degree symbol, only '0' - this is similar for other characters. Arial Unicode MS is definitely installed and has the characters in it.

In the Juce demo you can paste the degree symbol into the Font demo and it displays correctly under this font. Any ideas why this isn't working?

Thanks, Ivan

jules · August 22, 2015, 12:45pm

You're asking it to parse some UTF8, and then giving it a literal that isn't UTF8!

If you read the original post in this thread, that's exactly the mistake I was talking about!

ivan · August 22, 2015, 1:11pm

I don't understand I'm sorry. I though that by writing

String degreesymbol = String::fromUTF8("\u00B0");

I was creating a String that could then be drawn using drawText to show the appropriate character.

Is that not the case?

If this is not correct, can you tell me what I can do to achieve this?

Thanks for your assistance.

Added a few minutes later...

Actually I now can do it using

String degreeSymbol = CharPointer_UTF8("\xc2\xb0");

So I wasn't using the correct utf8 code? Apologies I am learning this from zero knowledge - I will research some more and try to get a better understanding.

It works anyway which is the important thing for the moment! :-)

Thank you,

Ivan

lalala · August 22, 2015, 1:21pm

you want to have a look at the introjucer-> menu 'Tools' -> 'UTF 8 String literal helper'

ivan · August 22, 2015, 1:26pm

Thanks you can see from my edited reply that's what I just did - I think I understand now just about but I will read some more!

daniel · September 16, 2015, 8:54pm

I'm trying to display musical symbols using Graphics::drawText(). I have the chart in http://www.unicode.org/charts/PDF/U1D100.pdf

The symbols start at 1D100, so obviously not in the 16 bit range. How can I specify these in the code? The tool is unfortunately no help, because it takes the actual symbol but not a hex code. Also the tool creates 16 bit codes...

I'm lost...

jules · September 17, 2015, 9:04am

const juce_wchar myUnicodeString[] = { 0x123456, 0x345678, 0 };

String s (myUnicodeString);

daniel · September 17, 2015, 9:55am

Thanks for the syntax. But I fail with the semantics. Can you please give me one example for 0x123456 and 0x345678?

e.g. a soprano clef: 1D11E ?

I tried various combinations and converting a 4-byte word into two 3-byte words and a 0? The search engines are spammed with misunderstandings of types and unicodes, so I had no luck there...

jules · September 17, 2015, 10:23am

Erm.. 1D11E would be 0x1D11E (...?)

Topic		Replies	Views
:?: unicode font name Windows	11	1372	May 12, 2017
HELP ! Can't display Chinese String Text in Win8 Windows	4	1809	March 28, 2014
Can't input non-English characters in TextEditor General JUCE discussion	16	2578	June 30, 2015
Updating the Juce Font System General JUCE discussion	37	3152	May 24, 2012
JUCE new learner appeal everybody:About Chinese messy code on GUI General JUCE discussion	6	731	December 24, 2020

Embedding unicode string literals in your cpp files

Purchase

Discover

Learn

Support

About

Events

Embedding unicode string literals in your cpp files

Related topics

Purchase

Discover

Learn

Support

About

Events