Extended character set not compatible with TextEditor LengthAndCharacterRestriction?

sdc · September 1, 2014, 5:40pm

When I include extended-character-set characters in my strict list of characters permitted for my labels the extended characters in the allowed list are ignored and subsequently prevented from being input.

I am using my own MyLabel class which is derived from the Label class and adds a few (hopefully unrelated) customisations. I want to restrict the character set used in these labels to allow basic ascii characters plus some characters from the extended Latin1 character set.

I wrote a function that the MyLabel constructor calls, to set the allowedCharacter variable. It currently looks like this:

allowedCharacters = CharPointer_UTF8 ("\x41\x42\x43\xfd\xfe\xff");

This should limit the inputable characters to: ABCýþÿ

The first three are standard ascii characters and the last three are the last three characters in the extended set.

The MyLabels in my program resultingly accept A, B and C characters, but do not accept the extended characters. If I only specify these extended characters (i.e. remove the ABC codes and just keep \xfd\xfe\xff) then it seems to be interpreted as an empty string and therefore allows all characters. While it is accepting all characters I am able to enter the ý, þ and ÿ characters into my labels successfully.

Is there something I'm missing?

jules · September 1, 2014, 6:43pm

Those don't look like valid utf8 sequences to me.

sdc · September 2, 2014, 2:25pm

Hi Jules,

Do you mean the sequence "\x41\x42\x43\xfd\xfe\xff" doesn't look right?

The number used after each \x is the hex value for the desired character as specified on this web page: http://www.ascii-code.com/

I performed two other checks to see if I was encoding the right values. First I tested them in python, as a quick way to have them interpreted as unicode values. When I run print(u'\x41') the character A is printed, which is correct. Secondly, in my function that sets the allowedCharacters string I added a couple of lines to open a text file and append the contents of allowedCharacters to it. The characters that were appended to the text file were correct (ABCýþÿ).

So it seems like I'm using the correct codes, and the writing-to-file test suggest that the characters are correctly held in the allowedCharacters string. Based on your stickied post about encoding (http://www.juce.com/forum/topic/embedding-unicode-string-literals-your-cpp-files) I thought I was on the right track.

So are codes definitely incorrect? And what *should* they look like?

Thanks!

jules · September 2, 2014, 2:36pm

No, those escape sequences are garbage. It may work if interpreted as ascii in a local code page, but it's certainly not valid UTF8!

Try pasting ABCýþÿ into the introjucer's string literal tool and it gives you the correct sequence:

CharPointer_UTF8 ("ABC\xc3\xbd\xc3\xbe\xc3\xbf")

sdc · September 2, 2014, 3:38pm

Thanks Jules, I see my mistake now. It's now fixed and working nicely. Thanks!

Topic		Replies	Views
Introjucer Request: UTF-16/UTF-32 String Literal Converter The Projucer	2	553	August 29, 2013
Label & setInputRestrictions General JUCE discussion	7	2433	January 5, 2021
Russian characters issue General JUCE discussion	16	1416	July 1, 2019
Getting TextEditor to display UFT8 characters General JUCE discussion	3	575	December 30, 2019
Translation File Builder and UTF-8 The Projucer	5	1112	October 17, 2016

Extended character set not compatible with TextEditor LengthAndCharacterRestriction?

Purchase

Discover

Learn

Support

About

Events

Extended character set not compatible with TextEditor LengthAndCharacterRestriction?

Related Topics

Purchase

Discover

Learn

Support

About

Events