Complex Text Layout on Linux

sfeinstein · April 23, 2015, 9:53pm

Hi,

I've got an audio application that needs to display international text, especially several languages that require complext text layout (CTL), i.e. complex scripts like Arabic, Urdu, Farsi, etc.

I see from a thread from a few years back (http://www.juce.com/forum/topic/updating-juce-text-system) that although the JUCE text system was reworked to support CTL on Mac and Windows, this functionality has not been extended to Linux. I realize this is a lot of work and I'm not able to take it on myself at the moment. However, I have an urgent need to display complex scripts.

It would be great to continue using a juce::Label to do this, but it looks like that's not possible. (It's not, right?) I've just started trying to size up pango and GTK+ for accomplishing this, but it doesn't look like an easy fit so far. Before I go too far down a rabbit hole, I thought I would put the question out to the JUCE forum:

Does anyone have any suggestions for a way to get a complex script displayed on Linux?

Is there any code in the works to support CTL on Linux that could be shared?

Thanks!

Steve

ronaaron · April 26, 2015, 5:19am

Hi; I've been dealing with RTL issues myself.

The simplest thing to do is to override the LookAndFeel members which draw text, and use an "AttributedText" to do the text output. This works ok for Labels and Buttons; unfortunately, the TextEditor is extremely complicated, and I've got someone trying to make it work for RTL, with not too much success.

roeland-2 · July 8, 2015, 3:00am

For editing arbitrary Unicode text, RTL is only one of your worries.

The AttributedString solves those fancy text layout issues by offloading all of that to the DirectWrite API. This does a lot of things, like:

Getting glyphs not present in the current font from another font. That's how an AttributedString can render characters from a lot more languages than a GlyphArrangement.
Combining marks. You can encode “é” either by inserting U+00E9 (e with acute accent), or by U+0065, U+0301 (e + combining acute accent). The renderer might use a precomposed glyph anyway in both cases. And when using the latter it may or may not desirable to erase both code points with one press of backspace. Note that you need to render the letter “i” with an accent without the dot, as in î.
For some complex scripts like Arabic you have character shaping, where the glyph you want for a given code point depends on the surrounding letters.
Sorting out bidirectional text. This gets really complicated when you have a mix of, say, Latin and Arabic scripts, and there are control characters to override the default direction of a script. A contiguous range of characters in your string might have a gap between them on the screen. And you may have to mirror [brackets] and “quotes” as well.
Ligatures. There are fonts, especially serif fonts used in print, which define ligatures. "fi" and "ffi" may be rendered with a single glyph, slightly different than the separate glyphs. The cursor might then need to be placed in the middle of that glyph. But Turkish differentiates between dotted and dotless i (i or ı), so it can't use those ligatures. Anyway, at least for on screen rendering you can get away with disabling the use of ligatures.

Any approach assuming there's a 1-to-1 mapping from a code point to a glyph, and those glyphs will be rendered one after each other, will never work.

I would say, if at all possible, rely on the API available from the operating system (no idea what would be the standard usually available API on Linux systems though). Those APIs allow you to treat the rendered text layout as a black box, and it allows you to ask at which offset to render the cursor for a given index in your text string (see for instance IDWriteTextLayout::HitTestTextPosition).

PS. Speaking of Turkish: Usually the upper case of "i" is "I". Unless you have Turkish text, then it is "İ". But that's a whole different can of worms.

X-Ryl669 · August 13, 2015, 6:05pm

You need to have a look to Harfbuzz (it's what's being used in Firefox & Chrome). This component is doing the text rendering using the right unicode script application (that's the only code that works right now for all languages). The dependencies are not big (IIRC, if you need *everything*, you'll have to use ICU, but it's not required if you don't need everything from Unicode).

You can make Harfbuzz generate raster lines (that is, the [ x_start x_end ] for each horizontal line of the rendered text. This could be mapped without too many efforts to the EdgeTable used in Juce). The basic idea being you'll render your text with harfbuzz at 10x the size (or less depending on how much anti aliasing you need), then get the scan-lines from it (by copy and pasting the examples code from Harfbuzz site, it's 60 lines), then fill them in Juce (still to be done, but not necessarly hard).

This allows to display text in whatever langage correctly.

This however does not solve the text "input" issue, where you need to map a position to a glyph and then one or more character in your string (think of when the user press "left" key to go on the previous *displayed* char, or click on some text in a text editor to position the cursor).

If you have ligature or a script underneath, there the currently displayed glyph might correspond to multiple characters. Harfbuzz gives a bounding box for each glyph displayed (in terms of chars used for generating the glyph), so you can get the unicode's chars involved for this glyph.

This unfortunately is never going to work in Juce without a huge effort, because the current interfaces expect 1 glyph = 1 unicode char (which is wrong). I don't think there are any solution to this, expect paying Roli to hire some guy fixing the interfaces everywhere in the source code.

Cheers.

Topic		Replies	Views
Suggestion on supporting unicode and Complex Text Layout General JUCE discussion	1	1074	June 4, 2018
Updating the Juce Text System General JUCE discussion	24	3166	November 30, 2011
Graphics drawText etc do not behave properly for RTL text MacOSX and iOS	12	2507	March 29, 2017
Entering/displaying Hindi (Devanagari) font General JUCE discussion	9	740	June 24, 2024
Non-Latin Text Rendering in Various Components General JUCE discussion	3	482	September 2, 2023

Complex Text Layout on Linux

Purchase

Discover

Learn

Support

About

Events

Complex Text Layout on Linux

Related topics

Purchase

Discover

Learn

Support

About

Events