Unicode and font rendering in JUCE

roeland-2 · November 20, 2015, 8:57pm

So, I went to the Lorem Ipsum generator and got a bit of text in a few different scripts. I also added a line of emoji (those are not images, but characters from the 'Emoticons' Unicode block).

Lorem ipsum dolor sit amet
את מדויקים מיוחדים אקטואליה לוח
أي غير موالية بتطويق.
謺貙蹖郺鋋錋蒠蓔蜳餤駰銈,
<emoji go here, but the forum breaks if I try to include them>

Depending on the font, a Label will render most rows with just boxes, and it will also render the Hebrew and Arabic left-to-right.

(image appears to be gone)

An AlertWindow fares better:

Not sure how it works exactly in JUCE, but it appears to make a few assumptions:

(1) You can render a block of text with just one font

This is the most crippling one in the list. On Windows, JUCE by default usese Verdana and Tahoma, which have a very limited coverage of Unicode, so you'll often see boxes instead of letters whenever you encounter something else than Latin text.

When a browser displays the font in the quote above, it will use glyphs from a number of different fonts to display that text. Eg. on Windows 7 the first line may be "Arial", the last line (the emoji) is probably "Segoe UI Symbol". You don't have to change font settings for this, the browser will do this automatically.

(possible bug: In the AlertWindow the line of emoji is truncated. There could be a mix-up between Unicode code points, and the amount of 16-bit elements used to encode them as UTF-16.)

(2) there is a 1 to 1 mapping between code points and glyphs

This one breaks for a few scripts. If you select a font with Arabic glyphs in the demo, you'll see it still looks wrong. Arabic letters can have a few different forms, which are selected depending on the surrounding letters.

(3) text goes left-to-right

Hebrew and Arabic should be written right to left.

What's happening behind the scenes?

From what I understand, JUCE uses two methods to render text:

For a lot of widgets (including the common ones like buttons and labels) it goes via GlyphArrangement, which goes via WindowsDirectWriteTypeface::getGlyphPositions. The code in that function starts from the above assumptions and as a consequence can only handle a very limited subset of Unicode.
For dialog windows, it goes via DirectWriteTypeLayout::createLayout, which uses DirectWrite directly to generate the entire layout. And the above text will be rendered properly.

So what?

Usually (at least for us Westerners) we can get away with this. Sometimes you can't. Eg. you may encounter files with names in other languages. You may encounter text containing these emoticons. Or someone may try to translate his application to another language.

Are there plans to improve on this?

--
Roeland

jules · November 24, 2015, 9:45am

Thanks - yes, we know.

The main problem is optimisation - we could change things so that DirectWrite/CoreText is used for all text everywhere, but it's very slow, so if we did that it'd make performance cripplingly slow in some apps. Ideally, it'd be nice to find a good cross-platform way of doing it, but the open-source libraries that do this are huge and have horrible dependencies, so would be a nightmare for people to build with.

harrycodex · June 20, 2017, 11:31am

I think I may have made a partial duplicate of this thread here Current state of Fonts/Emoji/Unicode symbols

Are there any improvements on the horizon?

jules · June 20, 2017, 11:33am

Haven’t got any imminent plans, I’m afraid

stian · September 19, 2018, 1:39pm

We’re getting complaints from users loading files with names containing international characters that aren’t displayed correctly. Wouldn’t it be possible to check the string itself for non-latin characters and use the optimized text rendering code when possible and fall back to DirectWrite / CoreText when required (i.e. the string contains non-latin characters)?

otristan · September 19, 2018, 2:15pm

Even with directWrite, you will have issue as the fallback font do not work if you use a custom juce font.

In my code I sometimes do trick like that to avoid using my custom font if I detect non latin characters.

static bool canBeRepresented(const juce::String &text)
    {
      bool res = true;
      int i = text.length();
      while (res && i >= 0)
      {
        int32_t c(text[i]);
        if (!(
              (c < 0x024F) || // latin char
              ((c >= 0x2000) && (c < 0x20D0)) // punctuation
              ))
        {
          res = false;
        }
        i--;
      }
      return res;
    }

matkatmusic · September 19, 2018, 5:17pm

why not just return false; as soon as you detect it?

asimilon · September 19, 2018, 5:48pm

Might save a few instructions, but the while loop will exit as soon as res == false
Would make sense though to dump the res variable entirely and just return immediately as you suggest :

static bool canBeRepresented(const juce::String &text)
    {
      int i = text.length();
      while (i >= 0)
      {
        int32_t c(text[i]);
        if (!(
              (c < 0x024F) || // latin char
              ((c >= 0x2000) && (c < 0x20D0)) // punctuation
              ))
        {
          return false;
        }
        --i;
      }
      return true;
    }

jules · September 19, 2018, 7:03pm

Actually, that’s irrelevant as far as performance goes… however the String::operator method always has to scan the entire string to find a character, so a function written like that will get exponentially slower as the length increases - try this with a big enough string and it’ll lock up for seconds or even minutes!

Here’s how it should be written:

static bool canBeRepresented (juce::StringRef text)
{
    for (auto t = text.text; ! t.isEmpty();)
    {
        auto c = t.getAndAdvance();

        if (! (c < 0x024f || (c >= 0x2000 && c < 0x20d0)))
            return false;
    }

    return true;
}

stian · September 20, 2018, 6:21am

Thanks, Oliver! I do use a custom font, so I’ll need to do something similar. These file names appear inside all kinds of JUCE components, so it’ll be a rather time consuming issue to solve… I hope for a more general solution in JUCE in the future.

stian · September 20, 2018, 6:26am

Thanks for the tip, Jules. I’ve actually encountered this problem before as I assumed that the [] operator would be O(1). I wrote a utility to load a JUCE project and create dictionaries (and lists of missing strings) for translations based on the source files and a translation data base. Needless to say, It turned out extremely slow… Your solution is much nicer than the work-around I used…

otristan · September 20, 2018, 9:31am

Thanks Jules.
Still I wouldn’t mind not having to do this at all

jules · September 20, 2018, 10:10am

Yep, understood!

stian · September 20, 2018, 1:53pm

I’m still not sure why the JUCE drawing methods cannot just look for international characters and split the text up in chunks. The parts that can be displayed with the optimized code could the be drawn with the respective methods and the other text chunks could use the DirectWrite glyph run stuff and the equivalent for CoreText. Am I missing something?

jules · September 20, 2018, 2:03pm

Well a) there’s no such thing as “international” characters… it’s all unicode, and all that matters is whether a particular glyph is in a particular font or not. You could have a font that is missing some basic ASCII characters too.

And b) the way the native layout functions tend to work is that they take a chunk of text and lay it out as a whole, so you can’t really split it up and then somehow join together separate layouts for different bits

otristan · September 20, 2018, 3:26pm

And it would look different

roeland-2 · September 20, 2018, 10:55pm

Also, you then have to handle other hairy layout issues like how to reflow all those bits over multiple lines, how to handle right to left text, etc.

There’s also cases where the current system breaks for Latin text. Try what happens if you past â ê î in a Juce text editor.

stian · September 21, 2018, 6:54am

Thanks – I realize that this is complicated stuff, but simply showing nothing seems like the worst of all alternatives. Text rendering doesn’t seem to work with a font like Google Noto either, so it’s not only a matter of missing characters.

stian · September 21, 2018, 6:56am

That Chinese, Arabic and Latin texts look different would be the least of my concerns…

otristan · September 21, 2018, 7:23am

If you use the new juce::Typeface::createSystemTypefaceFor for your custom font it should help.

In my code I use a juce::CustomTypeface because this is the only thing that was available at the time and if I switch to the new createSystemTypefaceFor, it looks different, so…

Topic		Replies	Views
Unicode/non-Latin text display General JUCE discussion	11	967	October 13, 2023
Juce text rendering is unprofessional and amateur General JUCE discussion	17	1420	November 6, 2012
Unicode text rendering & editing in 2022 General JUCE discussion	25	2791	October 2, 2023
How to render an Emoji with juce::Graphics Development	6	1214	April 27, 2021
Can't input non-English characters in TextEditor General JUCE discussion	16	2646	June 30, 2015

Unicode and font rendering in JUCE

Purchase

Discover

Learn

Support

About

Events

Unicode and font rendering in JUCE

Related topics

Purchase

Discover

Learn

Support

About

Events