Thanks. String::createStringFromData needs to be updated as well.
for (int i = 0; i < numChars; ++i)
builder.write ((juce_wchar) ByteOrder::swapIfBigEndian (src[i]));
Each utf16 pair is getting converted to a wchar, then when encoded to utf8 it’s actually becoming cesu8 instead. Seems that cesu8 ‘works’ on macOS because it print’s to the terminal ok.
My unit test looks like this:
void testOverwriteWithTextUnicode()
{
beginTest ("Overwrite with Text - Unicode");
auto tempFile = juce::File::getSpecialLocation (juce::File::tempDirectory)
.getChildFile ("gin_test_" + juce::String::toHexString (juce::Random::getSystemRandom().nextInt()) + ".txt");
auto testText = juce::String::fromUTF8 ("Hello 世界! Emoji: 🎵");
expect (overwriteWithText (tempFile, testText, true, true, nullptr),
"Should write Unicode text successfully");
expect (tempFile.existsAsFile(), "File should exist");
juce::String readBack = tempFile.loadFileAsString();
DBG(testText);
DBG(readBack);
expectEquals (readBack, testText, "Unicode text should be preserved");
tempFile.deleteFile();
}
In the terminal I see:
Hello 世界! Emoji: 🎵
Hello 世界! Emoji: 🎵
But when I inspect the strings in the debugger I see:
(lldb) p a
(juce::String) $0 = {
text = (data = "Hello 世界! Emoji: \xed\xa0\xbc\xed\xbe\xb5")
}
(lldb) p b
(juce::String) $1 = {
text = (data = "Hello 世界! Emoji: 🎵")
}