CharPointer_UTF8 and operator--


#1

Hello Jules,

in juce_charPointer_UTF8.h, it seems to have an issue on a test for operator–.

    /** Moves this pointer back to the previous character in the string. */
    CharPointer_UTF8& operator--() noexcept
    {
        const char n = *--data;

        if ((n & 0xc0) == 0xc0)
        {
            int count = 3;

            do
            {
                --data;
            }
            while ((*data & 0xc0) == 0xc0 && --count >= 0);
        }

        return *this;
    }

(*data & 0xc0) should be compared to 0x80 and not 0xC0, to check if there are more bytes for this UTF8 Character.

It creates an issue when you want to use fromLastOccurenceOf for String with 2 or more bytes UTF8 character.


#2

Ah! Thanks, yes, you’re right! I’ll get that sorted out…


#3

It looks like this has been corrected to the following code:

CharPointer_UTF8 operator--() noexcept 
{ 
   int count = 0; 

   while ((*--data & 0xc0) == 0x80 && ++count < 4) 
   {} 
​
   return *this; 
}

but now it doesn't seem to work within StringArray::addTokens() with one of my UTF8 strings, containing "Root[0:157b2a0]§", where the

separator is "§". Am I doing something wrong or is there possibly still a glitch with the algorhythm?

*Just to clarify, the problem is when the pointer is past the last character and we do a --, it goes back 2 characters (becomes "]§" instead of "§"). Then of course everything is messed up further, in the calling method. 

 


#4

My guess is that like 99% of the posts we get here about confusion over text encoding, you're not passing it a valid UTF8 string. See this:

http://www.juce.com/forum/topic/embedding-unicode-string-literals-your-cpp-files


#5

Looks like it. Thanks.