String, setlocale, mbstowcs, wcstombs and the filesystem

photon · August 26, 2006, 2:15am

The documentation says that the String class internal representation of the data is in unicode (actual wchar_t is encoded in UCS-2 or UCS-4 dependend on the current os) or in ASCII. This is controlled by the JUCE_STRINGS_ARE_UNICODE-define. If the other representation is needed it will be converted from the internal one via mbstowcs or wcstombs. But these functions don’t convert between unicode and ASCII, but between unicode an the encoding of the current locale, which is by default the C locale. In the default case it’s right to say mbstowcs and wcstombs convert between unicode and ASCII, because the C locale uses ASCII.
On windows I didn’t have any problems with this, because JUCE and the part of the WinAPI concerning the filesystem communicate via unicode. But on linux JUCE uses these functions to produce strings i.e. for fopen. There is no problem until you have non-ASCII characters in your paths like ü, ö, ä, ß or something else. Then JUCE will fail to handle those paths correctly. It’s not JUCEs fault after all. I tooks me several hours to figure out that I need to set the locale via

#include <locale.h> ... setlocale(LC_CTYPE, "");
to the current system-locale. In my case this includes UTF-8 as encoding on my ubuntu-box. So fopen expects a path encoded in UTF-8 rather than in ASCII. With the locale set to the system-locale wcstombs will convert unicode to UTF-8 and fopen and other functions related to this will be happy.

This leads to two suggestions I would make:

In the documentation and in the code should not be refered to strings in locale-encoding as ASCII.
The documentation should give a hint to set the locale to the system-local via setlocale(LC_CTYPE, “”) if problems with paths or something related to this occur on linux.

jules · August 30, 2006, 10:31am

Thanks for digging that up for me - I’d missed it on linux, which I tend not to use very heavily.

Ideally it’d be better for Juce to set the locale to UTF-8 and always use that for filenames (all filenames on the mac are done as UTF-8). I’ll brush-up on my knowledge of locales and see what I can sort out.

(And you’re quite right in saying I shouldn’t say “ascii” - that’s a throwback to old code and I’ll do a search and tidy it up…)

Topic		Replies	Views
Unicode conversion General JUCE discussion	5	493	May 12, 2017
Changes to String and 'good old ASCII' General JUCE discussion	8	991	February 10, 2011
String representation General JUCE discussion	40	2962	December 12, 2012
UNICODE and JUCE_STRINGS_ARE_UNICODE Windows	5	653	July 1, 2008
String and wchar_t* General JUCE discussion	3	998	February 13, 2011

String, setlocale, mbstowcs, wcstombs and the filesystem

Purchase

Discover

Learn

Support

About

Events

String, setlocale, mbstowcs, wcstombs and the filesystem

Related Topics

Purchase

Discover

Learn

Support

About

Events