drawText UTF8 issue

I used the UTF-8 String Literal Converter to convert the following text…
Lalaá 0123
into
juce::CharPointer_UTF8 (“Lalaa\xcc\x81 0123”)

Unfortunately after using drawText with the default Font i just see something like

Lalaá 123

for some reason it seems like if á is followed by a space character it will always swallow the following character which is a zero in my example and turn it into a second space character.

can anyone help here? thanx!

I think that should juce::CharPointer_UTF8 (“Lala\xc3\xa1 0123”) which displays correctly. Not sure how you got your string.

Sorry. i forgot to mention that this was the name of a file in macOS. i went to the finder and copied the name of the file. then i pasted the name into Projucers UTF-8 String Literal Converter. And that gave me that string. if course when i copy the name from the website and use it i will get your results here. but when i copy this name, paste it as the new file name. copy the filename string again and paste it into the Literal Converter i always get this result. And as you can imagine… if i read the contents of a folder and want to display the filename i will get strange results from drawText.

Yes I think that the default font does not support all unicode code points. My “á” character was unicode \u00e1 and yours is \u0061\u0301.

Yours displays fine on Windows. This post may still be relevant Default font on macOS extremely dated - Proposal to update

the main Problem is… it should show ‘á’ ‘0’ but instead i got ‘á’ . but if the name changes into lets say ‘ág0’ i will get exactly that. so only a ‘á’ directly followed by a space seems be a problem

to sum up.
original file name: Lalaá 0123
we focus on the part “á 0”

on mac this translates to “a\xcc\x81 0”. and if i use drawText the “0” part is not being drawn.
if i copy the file to windows that part of the name is changed to “\xc3\xa1 0” which displays fine.
if i use another character after “a\xcc\x81” instead of the space the problem also goes away.

so. what happens is…
i name a file with an ‘á’. on macOS the filename will use “a\xcc\x81” for that. now i iterate through a directory with std::filesystem and get filenames. one of those filenames contains exactly those characters for that ‘á’.
I call drawText for drawing.
JUCE calls GlyphArrangement::addCurtailedLineOfText → OSXTypeface::getGlyphPositions and there it uses the CoreFoundation functions to create an array of glyphs. so far everything is fine.
back inside addCurtailedLineOfText JUCE advances through the string again and adds the new glyphs to GlyphArrangement::glyphs. for some reason JUCE checks if a character inside the string is a space character. unfortunately CharPointer_UTF8::getAndAdvance() doesn’t recognize that those three characters are actually just one character. getAndAdvance will only recognize the ‘a’ first and the remaining two characters the next time. that means if a space character is following in the string the call to CharPointer_UTF8 ::isWhitespace will return true at the wrong position. one character too late actually. so CharPointer_UTF8::getAndAdvance() will be out of sync if the characters “a\xcc\x81” are used.
now if i “hack” into the code and make sure isWhitespace is always false then JUCE will finally draw the whole string! now we know the problem. the glyphs are prepared correctly but CharPointer_UTF8::getAndAdvance() will mess up here. or at least it won’t be in sync with apples CoreFoundation functions. i understand that this might not be valid UTF-8 encoding. but apple’s CoreFoundation functions will interpret this correctly and thats the string i’m given by std::filesystem so… whats the real solution here?

i got another one. this time it really is UTF8 encoded.

a simple word like “we’ve” translates into UTF8 like that…
juce::CharPointer_UTF8 (“we\xe2\x80\x99ve”).
i send this to a juce::Label with setText and on the screen i got…

“we’ ve”

not sure about windows but on macOS this it the result :frowning:

Unicode support is coming in JUCE 8 which means us having to deal with all of this.