Cannot read CJK text from file and convert to String correctly on Windows

Hi,

I have a txt file containing Chinese text (see below link for that file). The encoding of the file, according to Notepad, is ANSI. Notepad can display it correctly.

Using File::loadFileAsString to read that file into String, Chinese text are incorrect.

Tracing down loadFileAsString, looks like String::createStringFromData always call getStringFromWindows1252Codepage, which calls CharacterFunctions::getUnicodeCharFromWindows1252Codepage to deal with the data from file.

Save the file as UTF-8 encoding then File::loadFileAsString works correctly. Using MultiByteToWideChar to convert loaded from file then convert it to String also work correctly.

  • Windows 10 (System language set to Traditional Chinese)
  • JUCE 4.3.0

I’ve down following but the result all the same:

  1. Change JUCE_STRING_UTF_TYPE to 8 or 16 or 32.
  2. Change Character Set project setting in Visual Studio to Unicode

I will keep digging to see if there is better way to improve the CJK handling in JUCE. Any suggestion would be appreciated. Thanks!

Hi sam,

The JUCE team is pretty busy at the moment, but we will get around to looking at this eventually.

JUCE does not support loading text files with ANSI encoding. See the documentation for File::loadFileAsString: the file must be in UTF-8 or UTF-16.

@fabian,

You are right, I missed that part. Thank you!