Updating the Juce Text System

I’m interested in updating the text system to make it much more powerful by adding support for things like multiple fonts, multiple colors, per paragraph attributes (justification, etc) and complex text support (Bidi, shaping, ordering).
Since I would like to see these updates make it into the tip, I was wondering what is the best way of going about this?
Should I just code something up and pass it along? Or would you like me to discuss the design first? Or is there no chance something this extensive will ever make it into the tip?

Well, normally I’d say “sure, have a go and I’ll take a look at what you come up with”, but to do this would be a really surprisingly complicated task, with a lot of messy interactions at many levels in the codebase…

I’ve explored it myself (I did get some right-to-left text running in a hacky way, in a win32 build as a custom piece of work for a client), and apart from being really nasty, it’s also incredibly slow. When you mention colours and paragraphs, you’re also talking about a completely new TextEditor (which is also something that needs doing), but which is again a very very complex task.

…so sure, if you want to have a go, awesome! But it would be a good idea to throw your architectural ideas at me first for sanity checking, before you get stuck in and find that it doesn’t fit!

So as you mentioned, it really is two complex problems:

  1. New Text Rendering
  2. New Unicode Text Editor

I think 2 will end up being far more complex than 1, but since 2 is dependent on 1 I’ll just focus on 1 for now.
I am concerned about slowness as well, there are a lot of data structures to iterate and re-create. Doing this repeatedly may be a show stopper but I won’t know for sure until I code it up.

Platforms
There are two approaches to rendering rich/complex text: using a low level text API or a high level text layout API. In order to achieve the goal in a reasonable amount of time, I believe we should be using high level text layout APIs. You give the high level text API’s a string, string attributes and rectangle, it performs all the complex layout and line breaking and it outputs laid out glyphs. The only downside to this choice is platform support.

There are 3 high level text API’s available:
DirectWrite - Works on Windows Vista and later, no extra libraries required
Core Text - Works on Mac OS X 10.5 and later, iOS 3.2 and later, no extra libraries required
Pango - Works on Linux, Android, Windows XP, Mac OS X 10.4, LGPL licensed, extra libraries required, multiple dependencies

So the downside only really applies to old operating systems like XP and OS X 10.4. Getting Pango working on those platforms should be possible but likely is a pain and adds about 7 MB worth of libraries, something that many people may not want. One solution to this issue is to refactor the existing code in JUCE (GlyphArrangement) into an internal “simple text layout” API. This would allow users on older operating systems to at least be able to have the simple text support that is currently present in JUCE without requiring developers to use additional libraries.

Existing Design
String -> Graphics API (Optional Stage) -> GlyphArrangement -> Text API Calls -> PositionedGlyphs -> Draw Glyph

New Design

Array of AttributedStrings -> Graphics API (Optional Stage) -> GlyphFrame -> GlyphLayouts -> Text Layout API Calls -> GlyphLines -> GlyphRuns -> PositionedGlyphs -> Draw Glyph
AttributedString -> Graphics API (Optional Stage) -> GlyphLayout -> Text Layout API Calls -> GlyphLines -> GlyphRuns -> PositionedGlyphs -> Draw Glyph

AttributedString

  • Has a String
  • One set of paragraph parameters (word wrap, directionality, line spacing, tab stops) that apply to the entire string
  • One or more ranges of characters to apply different font families, font sizes, font styles, foreground colors, background colors

Array of AttributedStrings

  • Each AttributedString element represents a paragraph

GlyphFrame

  • Multiple paragraphs of text
  • Rectangular box where all text should be painted
  • Feeds each element of the AttributedString array into a separate GlyphLayout
  • Has an array of GlyphLayouts
  • Paints/Creates new GlyphLayouts based on remaining space in rectangular box

GlyphLayout

  • A paragraph of text
  • Applies paragraph parameters to all text in the layout
  • Can display multiple paragraphs (if newlines present in the string) but only one set of paragraph parameters are applied to all text
  • Has an array of GlyphLines

GlyphLine

  • A line of text
  • Has positions of line in respect to the string used generate the layout
  • Has an array of GlyphRuns

GlyphRun

  • A run of text
  • Has an array of PositionedGlyphs

Text Layout API Calls
Once we have our Juce AttributedString, we create our platform attributed string: CFAttributedString, PangoAttrList
Then we feed our platform attributed string into the platform layout api: IDWriteTextLayout, CTFramesetter, pango_layout_new
The platform does all the layout and stores the results in platform text data structures

Software/OpenGL Rendering:
We iterate through the platform text data structures to create our own GlyphLines -> GlyphRuns -> PositionedGlyphs structures
Then we iterate through the juce text data structures to draw the following glyph by glyph:
Draw Background Color Rectangles
Draw Selection Rectangles
Draw Underlines
Draw Strikethroughs
Draw Glyph

Platform Rendering (CoreGraphics only at the moment):
Pass platform text data structures to platform graphics API to draw entire layout.

Sounds like a pretty sensible plan!

When I did my hacky win32 version, I used Uniscribe, which does work ok on XP, but is a bit slow, and messy to use. I investigated Pango, but it was a complete non-starter as it was so huge and dependency-ridden.

Cool.

I had considered Uniscribe earlier, so the system could support everything from Windows XP on, but ultimately decided against it. As you know, Uniscribe is a low level text API so it requires way more effort to get it working. By this time next year, XP will be 3 OS releases behind so I don’t think its worth the effort. Additionally, the new metro APIs in Windows 8 don’t even allow you to use Uniscribe, only DirectWrite, so its definitely heading off into the sunset.

Yeah Pango is fine for linux, since everyone already has it so they can run GNOME/GTK apps. On other operating systems (Android, OS X, Windows) its definitely a pain and huge, but at least it’s possible so I guess that is better than nothing.

A status update for those who are interested.

I have completed the first stage of work which was creating the AttributedString classes and getting rendering working using the CoreGraphics renderer on Mac OS X. The next step is to start implementing all the Glyph classes I previously mentioned so I can get the software renderer on Mac OS X working.

Now onto some examples. The boxes in the images are all Labels with a border. The Text Engine 1 boxes represent the current Juce Text Rendering System and Text Engine 2 represents the next gen system.

First up is some English text where the first paragraph is aligned to the left and the second paragraph is aligned to the right. The first 300 characters are using the Times font while the rest is using Lucidia Grande. A subset of 100 characters has been coloured blue.

Next up is some Arabic text. Both paragraphs are aligned to the right as it is a RTL language. The first 300 are using the times font as well, though I’m not seeing much difference between the arabic script in Times and Lucidia Grande. It’s possible that there is no arabic script in Times and it is doing auto font fallback to Lucidia or some other font that has arabic glyphs. You can notice the difference when you look at the interspersed english text. Compare “Loren Impusm” at the start to “Pagemaker” at the end of the first paragraph. Font coloring is working as well.

This post is already quite long so I’ll just include the rest of the examples as links.

Russian - http://i.minus.com/iSyGxCgPdY5c5.png
Thai - http://i.minus.com/ikqqzV7dDYFex.png
Hindi - http://i.minus.com/iYW8NX8YqVqMq.png
Hebrew - http://i.minus.com/ibowYwp9Fb7DVG.png

Wow! Nice work!

Another update.

I have completed the second stage of work which was implementing all the Glyph classes mentioned in the first post so I could get the software renderer on Mac OS X working. For the Mac side of things, this is nearly useless since it still requires CoreText and CoreGraphics to do the layout, and I’m not quite sure why you would use the Software Renderer when you can just use CoreGraphics. Still, there is one use case on Mac where this code is necessary and that is for rendering text in OpenGL.

One thing I have been concerned/curious about is the performance when drawing glyph by glyph. When the Unicode Text Editor is built, every time and insertion or deletion occurs, we need to re-layout and re-paint the entire paragraph and possibly all subsequent paragraph. This involves creating a new set of Glyph structures for every change made to the editor, even if it is only a single character.

Now that the Glyph code is in place, I was able to run some tests. When generating the text in the screenshot below using different rendering systems, I got the following results.

CoreText+CoreGraphics -> Layout by Layout CoreGraphics Rendering
8ms
2011-10-17 17:44:45.492 JuceText[6087:903] Draw Text Frame Start
2011-10-17 17:44:45.500 JuceText[6087:903] Draw Text Frame End

CoreText+CoreGraphics -> Glyph by Glyph CoreGraphics Rendering
16ms
2011-10-17 17:41:07.714 JuceText[5996:903] Draw Text Frame Start
2011-10-17 17:41:07.730 JuceText[5996:903] Draw Text Frame End

CoreText+CoreGraphics -> Glyph by Glyph Software Rendering
44ms
2011-10-17 17:36:53.696 JuceText[5917:903] Draw Text Frame Start
2011-10-17 17:36:53.741 JuceText[5917:903] Draw Text Frame End

Those numbers look promising to me so hopefully there won’t be any such issues with the Unicode Text Editor.

The image below was generated using the “CoreText+CoreGraphics -> Glyph by Glyph Software Rendering” method. You can see the following attributes:
Layout 1: Right Aligned
Characters 0-300: Times Font
Characters 301-End: Lucia Grande Font
Characters 100-200: Blue Color
Layout 2: Left Aligned
Characters 0-End: Lucia Grande Font

The arabic script is not present in neither Times or Lucia Grande so what you are seeing is automatic font fallback to a different font even though we are rendering glyph by glyph.

Onwards to stage 3 which is getting the existing Juce Text System working in this new layout system so OS X 10.4 and XP users can see some text and get at least the existing level of simple script text support.

Another update.
I have completed the third stage which was getting the existing Juce Text System working in this new layout system so OS X 10.4 and XP users can see some text and get at least the existing level of simple script text support. Like the existing TextLayout class, you can use different font families and font sizes. I have also added the ability to change the foreground color of text by character range, something that TextLayout is not capable of doing.

Now onto some examples.

The first image shows the Simple Text Layout system working on Mac OS X. Since this layout systems works the same way juce’s existing system does, complex scripts (for languages like Arabic, Hebrew, Hindi) will not display properly.

The second image show the Simple Text Layout system working on Windows.

Onwards to stage 4 which is getting DirectWrite working so Windows Vista and later (7, 8, etc) users will be able to view complex scripts.

Do you know about Harfbuzz-ng ?
This is a version of code that’s doing openscript shaping, and it’s not requiring lots of dependencies like old pango did.
That way, you can have a default, software based version supporting shaping for the numerous script (arabic, indi, thai etc…) down there.

I am aware of harfbuzz-ng. The reason I didn’t include it in the original list is because it isn’t a high level text API. As you said, Harfbuzz does text shaping, but that is all it does. It does not do text itemization or line breaking, both of which are critical for complex scripts and text display in general. Adding Harfbuzz-ng support to the next gen juce text system should be possible, you just need to write all the text itemization and line breaking code yourself.

It is possible to avoid using Pango and use ICU instead which is another high level text API. ICU doesn’t have any dependencies but it is a large library (l16 Mb). On the plus side, ICU is BSD licensed. There were some rumblings about modifying ICU to use harfbuzz-ng for shaping but I’m not sure how far that has come along.

It’ll be interesting to finally see how Android is doing text layout. They added some sort of support for harfbuzz-old in Honeycomb but the source code was never released. Once Ice Cream Sandwich comes out we can finally take a look. Maybe they wrote their own itemizer and breaker? I doubt it though since ICU has been on Android from the beginning.

Well, from what I’ve read in the code, it seems to support Uniscribe, so it should do this.

Using Harfbuzz-ng have 2 advantages from my point of view:

  • You have to write the code once, and it works on both Windows and Linux and Mac
  • You can provide a software fallback system (which you can’t do on Mac or DirectWrite, since the API is not made to query the glyph positionning rules)

Anyway, since Uniscribe is a microsoft techno, it’s possible it’s not available in Linux, and in that case, you’re 100% correct about the text breaking issue.
PS: I just checked, you can build HB with either Uniscribe, Glib (pango) or ICU engine for shaping. So you can have a correct shaping on all platform, only the dependencies will change.
On any serious platform, you’ll need to have ICU installed.
On Windows, you’ll need Uniscribe.

Are talking about font fallback here? You are right, on DirectWrite it is not possible to modify this, it happens automatically. On Core Text it happens automatically as well but you can setup your own font cascades so this should be possible.

You are right, it’s not available. The only way you can get Uniscribe working on Linux is via WINE.

This I don’t really understand. It’s not very clear to me how Harfbuzz backend system works. Harfbuzz not having documentation is not very much help. It is my understanding that Harfbuzz does the shaping not the backends. When you build Harfbuzz with these backends, can you use the backends to do itemization and breaking?

Harfbuzz still isn’t the multi-platform silver bullet:

The uniscribe backend is experimental and broken on Windows XP.
Source: [HarfBuzz] Disable uniscribe backend
Source: http://doom10.org/index.php?topic=1269.30

Harfbuzz doesn’t support Apple Advanced Typography (AAT) Tables. All the operating system Indic fonts on OS X and iPhone use AAT tables not OpenType tables. Without Core Text it is not possible to render the operating system indic fonts correctly.
Source: http://groups.google.com/group/mozilla.dev.platform/browse_thread/thread/e09b03086f6ea97b/79c4dca97ef050ba?pli=1

The documentation is almost inexistant (which is clearly not good).
From what I can see, when you build Harfbuzz, you can specify which backend to use (ICU / Uniscribe / Glib).
If you don’t do that you need to set up a function callback that HB calls when mapping/shaping a string, with “hb_buffer_set_unicode_funcs(buffer, hb_icu_get_unicode_funcs())” (example for ICU).
There is no need to set it up when you’ve build a backend (it’s done by default).

For laying out the text, you’ll call:

// Layout the text 
wchar_t cur_text; // In UTF-16 on Win32 
hb_buffer_add_utf16 (buffer, cur_text, wcslen(cur_text), 0, wcslen(cur_text));

This will call all the callback down to the ICU code (or Uniscribe, or Pango).

Concerning the missing support of the OpenType code on iPhone, I think it’s a false issue. With Harfbuzz, you’re giving a font file to work with. So you’ll embed a .otf/ttf file, and it’ll figure this out on its own.
In all case, I see Harfbuzz as a fallback for the Operating System’s own methods.
Under Linux there is no such method, so you need to use Harfbuzz.

Well, you misunderstand Harfbuzz with Harfbuzz-ng. Harfbuzz-ng was started to actually fix that mess. HB-ng is implementing OpenType which is a subset of AAT and TrueType (OpenType = AAT + TrueType + Adobe).

It’s doubtful that you’ll be able to use ICU on the iPhone. Apple considers it a private API since it is already included on every iOS device. People have had their apps rejected for doing so:


Pango and Fribidi are both LGPL which have their own issues with App Store distribution.

I’m not so sure that there is AAT support in harfbuzz-ng.

I have completed the fourth stage which was getting DirectWrite working so Windows Vista and later (7, 8, etc) users will be able to view complex scripts.

I have no plans on adding linux/android complex script support so this will have to be done by someone else once this gets integrated into the Juce mainline. The best route is to add support for either Pango or ICU. Alternatively, you can forge your own text layout API using harfbuzz. Just to be clear, all though there is no complex script support on linux/android, there is still simple script support which covers all languages that Juce currently works with on those platforms.

Simple Script Languages: English, Chinese, Russian, etc.
Complex Script Languages: Arabic, Hebrew, Indic, etc.

I have sent the code for this next generation text system to Jules. He’ll have to comment on if or when this will ever make it into the juce mainline.

While text rendering is complete, we still require a Unicode Text Editor. That is an incredibly complex project. It gives me nightmares.

Some examples of complex scripts on Windows:

English

Arabic

Russian - http://i.minus.com/iKhcqRKY5V6PD.png
Thai - http://i.minus.com/iTVkmwcV4XjF9.png
Hindi - http://i.minus.com/ibsXGzsjhTV5Bs.png
Hebrew - http://i.minus.com/i1AiWZ14f9Gni.png

Good work Sonic!

That’s a very good work indeed Sonic.
Can you send me the code so I can give it a shot with HarfBuzz/ICU integration on Linux ?

Jules, considering the newly reported issue, about text input, and since Android text input requires using some hack too (so the user can actually use any of the Java based virtual keyboard), are you open to the idea of adding a “native text input” component that’s creating whatever OS based input component, rendered out of flow, always on top of Juce interface.
On Windows, it’s a EDITBOX, on Android it’s a TextEditor, and so on. This would fallback to Juce based version somehow.

Excellent work indeed!

I think I should be able to use a similar trick to the one I use on iOS, to get the keyboard working. It’s something I need to look into soon.

Sure, check your PM.