Complex Text Support


#1

At the moment, JUCE only supports Simple Text:

  • Left to Right Text Directionality
  • One Unicode code point per glyph

This is sufficient to support most languages on the planet, but in order to support the scripts a few popular ones (Hebrew, Arabic, Hindi), JUCE needs Complex Text support.

Such support is a massive undertaking and will likely require the gutting of the current GlyphArrangement system.
Add to the fact that there aren’t very many people that need such support and I’m guessing that you probably won’t get around to it ever.

I was wondering if someone else was to implement such support for JUCE, do you think it could ever make it’s way into the JUCE tip? Or is this too big of a change?


#2

It’d certainly be a serious job, that’s why I’ve not tackled it yet!

In something like this, which would involve a lot of changes across many parts of the code-base, and some major re-designing of classes, even if someone else attempted it, I’m sure it’d require a huge amount of my time to tidy-up and refactor their work into a state that could be added to the library… So I’m not keen to commit to doing that, but would be open to discussing it!


#3

I don’t know what the big deal about right to left text is, I mean, Juce already supports it out of the box, just do

AffineTransform t = AffineTransform::scale (-1, 1);
myComponent->setTransform (t);

You just need the proper glyphs.

Next?


#4

I like your thinking there!


#5

Before I post my initial design idea, I just wanted to jot down my understanding of the JUCE text system.
There may be mistakes in my understanding so if you feel that any part of the text below is wrong, please let me know.

Components

The following is a list of built-in components which an end user is capable of editing text with a keyboard and/or mouse:

  • TextEditor
  • CodeEditorComponent
  • Label
  • Slider
  • Combo Box

The last three components in that list (Label/Slider/Combo Box) don’t actually have editing functionality themselves, they simply show and use a TextEditor Component to do the actual editing.
Thus there are effectively only two components that can edit text (TextEditor/CodeEditor). These widgets handle all editing related functionality like text selection and caret location.

Text can be drawn using either the Graphics class or the GlyphArrangement class. The Graphics class acts as a higher level way of drawing text and calls the lower level GlyphArrangement draw function to do the actual text drawing.

The only components that use GlyphArrangement to draw text directly are:

  • TextEditor
  • FileChooserDialogBox
  • PluginListComponent

All other components use the Graphics class to draw text.

Displaying Characters

After a GlyphArrangement object is created, you need to provide the object with the font you want to use and the text you want to display.
When doing this, the object examines your text character by character using the font you specified to determine what single glyph is associated with that character as well as the horizontal spacing information associated with that character.
If the character has any kerning information, this is taken into consideration in the horizontal spacing information. This information is determined either using Platform Specific Text APIs or the internal API in the CustomTypeFace class.
The single glyph and horizontal spacing information for a glyph is stored in a PositionedGlyph object and is added to an array that stores the PositionedGlyphs for all the characters in your text.

When you want to display your text (after you have added it) you call the draw member function on your GlyphArrangement object. The text is drawn Glyph by Glyph by calling the draw member function of each PositionedGlyph object from the PositionedGlyph Array of your text.
The draw member function of PositionedGlyph renders a single Glyph using a platform specific Text API or the internal text renderer.

Text Rendering

JUCE can render text on all platforms using its internal text renderer. JUCE can only render platform specific text using the following platforms:

  • Mac OS X (via CoreGraphics)
  • iOS (via CoreGraphics)
  • Android (via Android Canvas)

The fact that Windows is not in the above list is not a mistake. JUCE does not use GDI to render text on Windows, it uses it’s own internal text renderer. This is why text in JUCE does not look exactly the same as text in native windows apps and it is also why JUCE does not have aliased or cleartype support.

When using the internal text renderer, each glyph is rendered by JUCE from a Path. This path is generated using one of two ways:

  • Font Outlines (from truetype/opentype font file)
  • EdgeTables (from an image rasterization)
    Due to platform restrictions, JUCE can only generate Font Outlines on Windows, Mac OS X and Linux. JUCE can generate EdgeTables on all platforms. Font Outlines are preferred over EdgeTables.
    Glyphs generated by the internal text renderer are stored in a Glyph Cache to increase application speed.

#6

Hey, nice summary of the text handling! Don’t forget that you can get hinted output on Windows, as well as support for bitmap strikes (with a little juce hacking) from my FreeTypeFaces implementation:
http://rawmaterialsoftware.com/viewtopic.php?f=6&t=6393


#7

Yes, good summary!

I should note that the TextEditor’s internal structure is pretty creaky and has long been overdue for a good overhaul. That in itself is a pretty big job though - I sat down to look at it a while ago thinking it’d take me a couple of days to refactor, but quickly realised it was a bigger task, and had to leave it.


#8

When you start speaking about right to left, you must also speak about glyph ligature and substitution (think Arabic, Hindi, Thai, Hebrew), which is currently not supported in Juce.
There’s no point in trying to hack a RTL mode in the text editor, since the whole glyph substitution and positionning operation is specified (badly, I admit), in Unicode, and you’ll try to reinvent a “broken” wheel if you hack your own.

Since it’s a very VERY HUGE task, the best solution, IMHO, is to use something standard for this job, like HarfBuzz-ng
The idea would be to wrap/abstract the mapping from a unicode string of M characters to N glyphs’ index & position, at first, using the current code, and then later using HarfBuzz for this task.
Also, you probably want to do the opposite too (mapping a position back to the unicode character index in the initial string), for example when you are selecting a text in the editor, or positionning the caret, or while editing the text, so it could be a good idea to include this in the interface.


#9

I think the best approach is to break down the implementation into a series of design issues which everyone can comment on.
My hope is that jules will give an indication on how he would like to see such issues resolved to ensure that any resulting code would make it into the tip.
Once again, if you feel there are mistakes in my understanding of the facts, please point them out.

Design Issue #1: Should there be complex text support for JUCE formatted font files?

The Facts

JUCE can generate fonts in its own format using CustomTypeface::writeToStream.
JUCE formatted font files are very different from OpenType and TrueType font files.
JUCE only saves outline and kerning information for each glyph in JUCE formatted font files.

In order to support complex text:
A large number of members would need to be added to CustomTypeface to save the additional information (ligatures, etc) required for Complex Text support.
This additional information would need to be saved when writeToStream is used so it can be reloaded. This will increase the size of JUCE formatted font files.
Functions would need to be added to extract this additional information directly from OpenType and TrueType font files.
Classes would need to be written on a language by language basis to directionalize, reshape and reorder the glyphs using the additional information.

My Opinion

No.
I think it would take a team of programmers, linguists and a whole lot of time to add such support.
I feel the only way we will ever see Complex Text support in JUCE is by using existing APIs and only supporting OpenType and TrueType font files.


#10

I agree with you. CustomTypeface is only a hack to fit the “one binary fit them all” approach of embedding a font + images + resources in a single binary.
The other reason for its existence is that you can’t load a font with Win32 primitive Juce’s is using, without installing them first under Win32.
Since there is no genuine OpenType/TrueType parser in Juce (well, sort of, since it depends on freetype under linux), the first option would be to cross-platformize this part first.


#11

I think that FreeType should be a requirement of any type layout features that go beyond what is currently in Juce, for all platforms. It’s a reasonable requirement.


#12

Design Issue #2: What APIs should be used to support complex text?

The Facts

Each operating system has one or more APIs to assist us in supporting complex text.

Windows

  • Uniscribe
  • DirectWrite

Mac OS X

  • CoreText
  • ATSUI

iPhone

  • CoreText

Cross Platform

  • Pango
  • Harfbuzz

There are some important details one should know about the above APIs which I have outlined below.

Windows

  • Uniscribe
    Supported by Windows XP and later

  • DirectWrite
    Supported by Windows Vista and later

Mac OS X

  • CoreText
    Supported by Mac OS X 10.5 and later
    Supported on 32-bit and 64-bit machines
    Support TrueType fonts with AAT tables

  • ATSUI
    Supported by Mac OS 8.5 and later
    Supported on 32-bit machines only
    Support TrueType fonts with AAT tables
    Officially a legacy technology and not recommended for use on Mac OS X 10.6 and later.

iPhone

  • CoreText
    Supported by iOS 3.2 and later

Cross Platform

  • Pango
    LGPL Licensed
    Requires glib.
    Complex Text can be handled by using an internal backend or by using a platform specific backend.
    Requires freetype and fontconfig to use the internal backend.
    Uses Uniscribe in the Windows platform specific backend.
    Uses ATSUI in the Mac OS platform specific backend.

  • Harfbuzz
    BSD Licensed
    There are two versions of harfbuzz: harfbuzz-old and harfbuzz-ng
    harfbuzz-old contains complex text support for a number of languages but is no longer being developed.
    harfbuzz-ng contains complex text support for a limited number of languages and is being actively developed.
    harfbuzz-ng has Arabic support but does not have Indic support (Hindi, Thai, etc) as of March 2010.
    harfbuzz-ng has an unstable API and almost no documentation due to its active development
    harfbuzz does not support truetype fonts with AAT tables (used on Mac OS X for Indic languages)
    harfbuzz does not contain a bi-directionality algorithm

My Opinion

Focus on Uniscribe and CoreText initially.

Consider DirectWrite when the JUCE Direct2D renderer is ready.
Consider Harfbuzz-ng on Linux and Android when the API becomes more stable and it supports more languages.


#13

I disagree. Why should we create a dependency on FreeType for Windows and Mac OS X when we can use the existing native APIs?
Also, FreeType does not have any Complex Text support itself.


#14

Design Issue #3: Should we add Complex Text support through an entirely separate set of classes?

Facts

There are a couple ways to add Complex Text support:

  • Modify existing classes and add additional classes where necessary.
  • Do not modify any existing classes and add support by only using separate classes (with new unique names).

My Opinion

No.
I don’t think adding separate classes is maintainable. I’m pretty sure this would require a large number of Complex Text versions of existing classes.
For example: ComplexFont, ComplexGlyphArrangement, ComplexGraphics, ComplexTextEditor, ComplexLabel and on and on.


#15

There is a third option. Make GlyphArrangement abstract, and provide two subclasses. One for complex text and one for the current implementation.

A new class “TypeLayoutManager” can provide a virtual member for creating new GlyphArrangement objects (instead of putting them on the stack). A Graphics context can have a “current TypeLayoutManager”, and LookAndFeel can provide a member for obtaining the default TypeLayoutManager.

GlyphArrangement can have new virtual members added for performing certain operations sensitive to the type layout. For example, a function to advance a coordinate to the next line.


#16

I misunderstood…it seemed that you implied that the native APIs were unavailable. Nevermind that.


#17

[quote=“sonic59”]
My Opinion

Focus on Uniscribe and CoreText initially.

Consider DirectWrite when the JUCE Direct2D renderer is ready.
Consider Harfbuzz-ng on Linux and Android when the API becomes more stable and it supports more languages.[/quote]
I disagree. Using the OS stuff is good if you are checking the final binary size, but you have at least one MAJOR limitation:
You can’t load any other font than what the system’s provides (at least on Windows), unless you install the font, run your app, and remove your font, which is really painful and racy, and prone to UAC issues.

Harfbuzz is used in Firefox 4, so even if the API is not stable, the current API functions will always exists (only new one are added).
HarfBuzz doesn’t include a Fribidi algorithm, but this is not as bad as it seems, since such algorithms are useful to analyse a complex text and figure out where the RTL / LTR unicode characters should be placed. In 99% of the case, such characters are inserted by the operating system anyway while typing, and saved in the text file/stream.
There’s also a text direction analyzer that’s in BSD and doesn’t depends on glib like Fribidi, I don’t remember the name, but I can find it out if you need it.

Exactly, and you’d select the algorithm with a JUCE_CONFIG flag, so you can decide what is inside your application at compile time.
My first post explain part of the required interface for such a pure virtual class.


#18

[quote=“X-Ryl669”]Using the OS stuff is good if you are checking the final binary size, but you have at least one MAJOR limitation:
You can’t load any other font than what the system’s provides (at least on Windows), unless you install the font, run your app, and remove your font, which is really painful and racy, and prone to UAC issues.[/quote]

Ahhh…yes, this is why my original thinking was to require FreeType. A very professional application will not have its appearance sensitive to the fonts the user has installed, it will just bundle its own font internally. For this reason, any solution that requires the font be installed in the operating system is simply unacceptible.

No actually what I was thinking is that the LookAndFeel would provide the TypeLayout object. The default would be to return an object that encapsulates the current Juce behavior. If someone wanted to have the complex type layout, they would subclass the LookAndFeel, and return a different TypeLayout subclass at run-time. I’m thinking that Jules probably won’t want to include a full blown custom TypeLayout for foreign scripts (since it will almost surely have a dependency on a third party library) and that it would most likely be available as a separate project that you download, call from your LookAndFeel, and link with.

So different TypeLayout could chosen at run-time instead of compile-time.


#19

Well choosing a compile time option means that for the vast majority of US developper, who still work in ASCII 7-bit :wink:, with no knowledge of TRANS whatsoever, still using “cout << untranslatableSentence << variableHardcoded << anotherPartOfTheEnglishSentence”, the overhead of FreeType + Harfbuzz + Fribidi (or equivalent) would be avoided.

For any other one, that’s ok to set up the compile time option, and like you’re saying, select it with a LookAndFeel.
IMHO, the glyph positionning shouldn’t be part of a look and feel of an application (semantically), but I’m probably over-pedantic here.


#20

I consider that the first option. Though upon further examination of the original text I wrote, I could see why you considered it a third option. I edited the original design post to make the options more clear.
I think the implementation you mentioned is likely the better approach than to try to create one GlyphArrangement class with all the simple and complex text functionality in it.

I’m curious, why would you remove the font from the windows font directory after you installed it?

I’m not a big of a fan of Fribidi due to it being licensed under the LGPL so I’m interested to know if a BSD licensed bidi algorithm exists.