Cannot read CJK text from file and convert to String correctly on Windows


#1

Hi,

I have a txt file containing Chinese text (see below link for that file). The encoding of the file, according to Notepad, is ANSI. Notepad can display it correctly.

Using File::loadFileAsString to read that file into String, Chinese text are incorrect.

Tracing down loadFileAsString, looks like String::createStringFromData always call getStringFromWindows1252Codepage, which calls CharacterFunctions::getUnicodeCharFromWindows1252Codepage to deal with the data from file.

Save the file as UTF-8 encoding then File::loadFileAsString works correctly. Using MultiByteToWideChar to convert loaded from file then convert it to String also work correctly.

  • Windows 10 (System language set to Traditional Chinese)
  • JUCE 4.3.0

I’ve down following but the result all the same:

  1. Change JUCE_STRING_UTF_TYPE to 8 or 16 or 32.
  2. Change Character Set project setting in Visual Studio to Unicode

I will keep digging to see if there is better way to improve the CJK handling in JUCE. Any suggestion would be appreciated. Thanks!


#2

Hi sam,

The JUCE team is pretty busy at the moment, but we will get around to looking at this eventually.


#3

JUCE does not support loading text files with ANSI encoding. See the documentation for File::loadFileAsString: the file must be in UTF-8 or UTF-16.


#4

@fabian,

You are right, I missed that part. Thank you!