String(int) and juce_wchar


#1

Well, I’ve seen the new changes, but I think this one is a bit too hard to accept.
If I were you, I would have changed the declaration of juce_wchar to read:
class juce_wchar
{
uint32 char;
operator uint32 & () { return char; }
juce_wchar(const uint32 char) : char(char) {}
};

That way, it would not break int / wchar methods anymore.
Also, one could change the type on the fly, without risk of breaking short / char type.
A good compiler will resolve this anyway.

(I also thought about using a namespace for juce_wchar, like typedef SomeNamespace::CharType juce_wchar, but it doesn’t work).


#2

Yeah, I thought about that, but I don’t trust the compiler to pack an array of juce_wchars correctly, and it’ll lead to other problems.

If you were freely mixing juce_wchar and wchar_t then your code wasn’t correct anyway, and would have hit subtle problems handling multi-byte utf-16 characters. That’s why I made all these changes - the idea is that code which was correct should still compile, but code that was making unsafe assumptions about encoding types may now need to be changed (and that’s a good thing)


#3

Well, the BIG issue with such wchar_t type is that, depending on compiler, it’s just a typedef to unsigned short or unsigned int.
In that case, logically good code such as :

  String s; 
  s += (wchar_t )L'S'; // Expecting to call s.operator += (wchar_t), but calling s.operator +=(unsigned int) instead

So unless the wchar_t is actually not a plain old INTEGER type, there will always be such issue.
On GCC, wchar_t is not an int, so there is no issue whatsoever.
On windows, however, up to VS2008, wchar_t is “typedef __int16 wchar_t;”

So, in order to have a version that works everywhere, I would use a struct wchar acting like an int.
Also, you can use pragma pack / pop to ensure the struct size.
You can also use an union (it’s guaranted that the size of the union is the largest of its elements, so put single uint32 inside, and you’re done).
Like this:

union Wchar
{
    uint32 ch;
};

Cons are that you need to actually use wchar.ch to access the member.
Pros are that if you put multiple char inside (like 4 of them), you can probably simplify the bit operation, relying on the compiler to do the shift & mask by itself.


#4

I’m not using wchar_t at all. The only rule I’m following is to use TRANS() macro for all user visible text, and JUCE_T for internal text.
All the libraries I’m using deals with UTF-8 char * string anyway, so I’m used to toUTF8() and fromUTF8() members for I/O external texts.

In fact, I would say that if the String class used UTF-8 internally, I would be more than happy, but it’s too specific to my use case.


#5

Also, you can’t really remove the JUCE_T macro, as some (most) of the text related stuff take a String as constructor, but passing a “char*” doesn’t work.
For example:

ValueTree tree;
tree.getProperty("some"); // error: can't convert from const char* to const Identifier &
tree.getProperty(String("some")); // Works, but it's more work than the former
tree.setProperty(T("some")); // Works too, less of a pain to write

#6

No… you’re talking nonsense there. The Identifier class actually has a constructor that takes a const char*, so getProperty (“xyz”) will work just fine.

I never use JUCE_T myself, and always use bare const char* literals - I recommend doing the same.