Extended character set not compatible with TextEditor LengthAndCharacterRestriction?


#1

When I include extended-character-set characters in my strict list of characters permitted for my labels the extended characters in the allowed list are ignored and subsequently prevented from being input.

I am using my own MyLabel class which is derived from the Label class and adds a few (hopefully unrelated) customisations. I want to restrict the character set used in these labels to allow basic ascii characters plus some characters from the extended Latin1 character set.

I wrote a function that the MyLabel constructor calls, to set the allowedCharacter variable. It currently looks like this:

allowedCharacters = CharPointer_UTF8 ("\x41\x42\x43\xfd\xfe\xff");

This should limit the inputable characters to: ABCýþÿ

The first three are standard ascii characters and the last three are the last three characters in the extended set.

The MyLabels in my program resultingly accept A, B and C characters, but do not accept the extended characters. If I only specify these extended characters (i.e. remove the ABC codes and just keep \xfd\xfe\xff) then it seems to be interpreted as an empty string and therefore allows all characters. While it is accepting all characters I am able to enter the ý, þ and ÿ characters into my labels successfully.

Is there something I'm missing?

 


#2

Those don't look like valid utf8 sequences to me.


#3


Hi Jules,

Do you mean the sequence "\x41\x42\x43\xfd\xfe\xff" doesn't look right?

The number used after each \x is the hex value for the desired character as specified on this web page: http://www.ascii-code.com/

I performed two other checks to see if I was encoding the right values. First I tested them in python, as a quick way to have them interpreted as unicode values. When I run print(u'\x41') the character A is printed, which is correct. Secondly, in my function that sets the allowedCharacters string I added a couple of lines to open a text file and append the contents of allowedCharacters to it. The characters that were appended to the text file were correct (ABCýþÿ). 

So it seems like I'm using the correct codes, and the writing-to-file test suggest that the characters are correctly held in the allowedCharacters string. Based on your stickied post about encoding (http://www.juce.com/forum/topic/embedding-unicode-string-literals-your-cpp-files) I thought I was on the right track.

So are codes definitely incorrect? And what *should* they look like?

Thanks!


#4

No, those escape sequences are garbage. It may work if interpreted as ascii in a local code page, but it's certainly not valid UTF8!

Try pasting ABCýþÿ into the introjucer's string literal tool and it gives you the correct sequence:

CharPointer_UTF8 ("ABC\xc3\xbd\xc3\xbe\xc3\xbf")


#5

Thanks Jules, I see my mistake now. It's now fixed and working nicely. Thanks!