Accented characters display problem

Elvira · October 28, 2013, 12:44pm

Hi Jules,

I think I've come across a problem in Juce with accented characters. I'm using the tip and I have reproduced it in the Juce Demo project.

If you have in your system a file or a directory with a filename that contains accented characters separated by spaces, e.g. "é é é.wav", and you look for this file in the file tree of the Audio->'File Playback' tab of the Juce Demo project, some characters in the filename will appear as blank, e.g. "é é wav". I have only been able to reproduce this problem when the accented characters are separated by blank spaces, e.g. "ééé.wav" filename is displayed properly. I can only reproduce this in Mac, not in Windows.

I have found out that accented characters can be represented in two different formats in Unicode :

- plain character code + accent code, e.g. in UTF8: é = "\x65\xcc\x81"

- accented character code, e.g. in UTF8 é = "\xc3\xa9"

When the characters in the string are written in the second format they seem to be displayed correctly, it's only the first format that generates this problem. The problem in the FileTreeComponent appears because the file name obtained by the DirectoryContentsList component is written in the first format.

Another thing that I have noticed about this issue is that it only happens with Label components, not with TextEditor components. If I set a Label containing this type of text as editable, the moment I click on it and the TextEditor is opened, the characters are correctly displayed. And as soon as I close the editor, the Label component displays it wrongly again.

Thanks,

Elvira

X-Ryl669 · October 29, 2013, 10:37am

This is because there are 2 forms of UTF8, NFC and NFD. MacOSX is using the NFD format (the one with 3 bytes), while everyone else on earth is using the former (with 2 bytes).

Typically, if you need to handle the unicode transformation processing, then you are out of luck with native Juce's text code, as it does not apply any unicode script to the text before rendering.

You can use native text rendering code, like DirectWrite on Windows, or "probably" CoreText on MacOSX (not sure about this one), so the OS does apply the unicode script for glyph transformation/substitution, but this means non-portable code.

On Linux, you have to use Harfbuzz-ng, and/or ICU yourself, it's not integrated in Juce.

What you've seen here for French is even worse in Arabic or more complex languages like Indic.

jules · October 29, 2013, 10:49am

There's actually a method String::convertToPrecomposedUnicode() which might do the trick for you.

IIRC the OSX filesystem doesn't automatically precompose the names, where other OSes do.

Elvira · October 29, 2013, 3:04pm

String::convertToPrecomposedUnicode() solved my problem, thanks!

Topic		Replies	Views
Issues with file names and Unicode on Mac OS X General JUCE discussion	1	827	May 17, 2017
Beginner encoding question	4	606	April 22, 2016
ValueTreeEditor can't display non ascii words correctly General JUCE discussion	2	256	September 19, 2022
JavaScript and accented characters General JUCE discussion	3	601	April 27, 2015
About drawText, TextLayout and japanese text display General JUCE discussion	3	1043	January 8, 2015

Accented characters display problem

Purchase

Discover

Learn

Support

About

Events

Accented characters display problem

Related Topics

Purchase

Discover

Learn

Support

About

Events