This topic seems to come up again and again, so I thought I'd make a sticky post here to avoid having to repeatedly explain it...
We regularly get people saying "The fonts are broken, my Chinese/Japanese/etc text won't display correctly!"
And the most common reason for this has nothing to do with fonts or graphics, it's because people have written code like this:
String textToDisplay = "一些文字";
The code above is going to screw up the encoding in at least some situations, depending on the compiler and editor that are involved. And no, it can't be magically fixed by sticking an L in front of the literal.
The String class is expecting UTF-8 characters, but compilers have no idea what type of encoding your text editor was using when you saved the source-file, and they'll make an assumption which is generally going to be wrong. So most likely, the encoding is going to get garbled somewhere between your editor, the compiler, and the library classes. The ONLY cross-platform way to embed a unicode string into C++ source code is by dumbing it down to ASCII + escape characters. That's a pain to write by hand, but luckily if you fire up the Introjucer and use its "UTF-8 String Literal Helper" tool, it'll do all the messy stuff for you, and convert any unicode string into a safe C++ expression that you can paste into your code, e.g.
Although this is a specific implementation and won't be sufficient in all cases (Jules you might want to fix this in the modules regarding the supplied AudioDeviceSelectorComponent class), the interesting point is that the "Arial Unicode MS" font seems to be compatible with both Latin and Chinese characters at once, both on Win and Mac. I thought I would share this and hope it can help some of you..
Thank you for this. This might be the way to go for the AudioDeviceSelectorComponent problem (Jules to decide).
As I said my main point was that the Arial Unicode MS seems to work and is a simple way for cases where there is no code to find default fonts that would work.
The bible says: "A string literal that begins with u8, such as u8"asdf", is a UTF-8 string literal and is initialized with the given characters as encoded in UTF-8."
I think this means that the source will (obviously) be in the basic character set, and the string will only ever be a valid UTF8 encoding, which I think makes it the same as the example using CharPointer_UT8. Presumably the alternative below does the same thing but I've not tested it:
Does your font definitely have the geometric glyph you want? I found on Windows that the system font was missing a lot of what I thought was obvious stuff...
and the geometric shape appears...but I don't know how to change the font size of a textbutton text, because I am using this geometric shape in a button and don't know how to make it look bigger.Maybe it can be done using LookAndFeel, but I am not very familiar with it as I am new in juce.
Create a custom LookAndFeel class for your button. You can do this by deriving from one of the default LookAndFeels and overriding the function that defines the font for the button text:
class MyLookAndFeel : public LookAndFeel_V3
{
Font getTextButtonFont (TextButton&, int buttonHeight) override
{
return Font ("Arial Unicode MS", 20.0f, Font::plain);
}
};
Then, create an instance of this class and pass it to the button by calling the buttons's setLookAndFeel method. That should do the trick.
Visual Studio 2015 RC has support for u8"blabla" literals, see https://msdn.microsoft.com/en-us/library/69ze775t%28v=vs.140%29.aspx , but that solves only half the problem.
The compiler also has to know the encoding of the source file. As far as I know there's no compiler option to specify the encoding of the source file. MSVC will assume it is the current code page of the system, based on your system-wide language settings. In other words, it depends on what computer you compile the source file on.
Unless you save the source files as either UTF-16, or as “UTF-8 with byte order mark(†)”. In those 2 cases the encoding is detected correctly.
So if you're able to consistently make sure files are saved in that encoding and all the other compilers you use support that, then maybe you can write u8"☺☺☺". Otherwise it's still at least u8"\u2639\u2639\u2639"
(†) The bytes [ef bb bf], the UTF-8 encoding of U+FEFF, are often prepended to an UTF-8 text file as a magic number to tell applications the file is encoded in UTF-8 and not in whatever code page your system is using. However some programs (eg. PHP) will misintrepret that as the file starting with U+FEFF or  or whatever.
Does not show the degree symbol, only '0' - this is similar for other characters. Arial Unicode MS is definitely installed and has the characters in it.
In the Juce demo you can paste the degree symbol into the Font demo and it displays correctly under this font. Any ideas why this isn't working?
So I wasn't using the correct utf8 code? Apologies I am learning this from zero knowledge - I will research some more and try to get a better understanding.
It works anyway which is the important thing for the moment! :-)
I'm trying to display musical symbols using Graphics::drawText(). I have the chart in http://www.unicode.org/charts/PDF/U1D100.pdf
The symbols start at 1D100, so obviously not in the 16 bit range. How can I specify these in the code? The tool is unfortunately no help, because it takes the actual symbol but not a hex code. Also the tool creates 16 bit codes...
Thanks for the syntax. But I fail with the semantics. Can you please give me one example for 0x123456 and 0x345678?
e.g. a soprano clef: 1D11E ?
I tried various combinations and converting a 4-byte word into two 3-byte words and a 0? The search engines are spammed with misunderstandings of types and unicodes, so I had no luck there...