Double-click selection is English-centric


#1

Saw this on the MaxMSP forum

…might due to Juce. Tested this with a Juce TextEditor.

Enter: “concrète”

double-click between the two c’s, and only ‘concr’ is selected rather than the whole word.


#2

That’s kind of strange… the code uses the normal iswalnum() function, which should return true for the accented character, but which clearly isn’t doing, possibly because the locale is set to english? Not sure what to do about that!


#3

Hi Jules,
I tried it on my Macintosh machine. I wasn’t able to select the entire text when I double clicked on the text. But if I clicks four times on it(i.e if I do two double click), it selects the entire line. If it’s a bug in the code shouldn’t this fail too.


#4

[quote=“vishvesh”]Hi Jules,
I tried it on my Macintosh machine. I wasn’t able to select the entire text when I double clicked on the text. But if I clicks four times on it(i.e if I do two double click), it selects the entire line. If it’s a bug in the code shouldn’t this fail too.[/quote]

No, the line-selection code doesn’t check for character types, it just finds the line break.

I’m a bit stuck by this problem… All I can think of is that I’d need to write my own function to find out whether a unicode character is a letter or not, with some sort of enormous look-up table.


#5

Won’t it be simpler to check if the character is special character or not. I am assuming that “è” won’t be treated as special character.


#6

What do you mean by ‘special character’? What I actually need to know is whether the character is alphabetic or not, regardless of whether it’s got an accent.


#7

dumb question: isn’t there a magic bit you can check? If there isn’t it’s a shame the Unicode people didn’t think of that (but TBH I haven’t thought through how complicated this might be…!)


#8

I meant symbols like “@”, “&”. If I have a string like “123@6789”, Double clicking on this text should select “123”


#9

Hmm, I guess that might work, though I’d need quite a big list of symbols that should act as punctuation…


#10

There is a function “isalnum” which checks if the input character is alphabet or numeral. It should return zero for special characters.

http://www.tutorialspoint.com/ansi_c/c_function_references.htm


#11

erm… yes, that’s exactly the function that doesn’t work, as I explained in my first post.


#12

I think I have an explanation for that now. http://biega.com/special-char.html. UTF treats “è” as a special character(as can be seen from the list). That explains why the function fails.

Looks like look up table is a good option. You can try checking for the special character(Characters like @,#, $ etc) Unicode range.


#13

It doesn’t matter whether it’s special or not - it could be both special and alphanumeric.

The problem is that the locale is used to decide whether an accented char is considered alphabetic:
http://93.186.177.167/mirror/cinanutshell/0596006977/cinanut-CHP-17-133.html

What I need is something that says “is this alphanumeric in any locale”, but can’t see a way to do that.


#14

Proposed:
Two sorted tables: first for ranges of characters and second for single characters (not in any range) belonging to alphanum unicode type, then do binary search.

You can extract tables using ICU library…

See also:
int isalnum_l(int c, locale_t locale)
The isalnum_l() function shall test whether c is a character of class alpha or digit in the locale represented by locale (there is wide character version too, iswalnum_l).
The functionality described is an extension to the ISO C standard.

Bye


#15

Ah, such a faff for something that should be so simple!


#16

I think what I’ll do is a slight bodge so that all non-ascii chars are considered alphabetic. That way it’ll always err on the side of selecting too much, and should be fine for almost all cases. At some point I’d like to add some proper unicode classes so when that happens I’ll revisit this.