OutputStream::writeText breaks text if asUTF16 is set to true

OutputStream::writeText breaks text if asUTF16 is set to true

        for (;;)
        {
            auto c = src.getAndAdvance();

            if (c == 0)
                break;

            if (! writeShort ((short) c))
                return false;
        }

You can’t just cast a wchar_t to a short and call that utf16. There are utf16 conversion functions in juce that could be used.

Thanks for reporting, that should be fixed here:

1 Like

Thanks. String::createStringFromData needs to be updated as well.

            for (int i = 0; i < numChars; ++i)
                builder.write ((juce_wchar) ByteOrder::swapIfBigEndian (src[i]));

Each utf16 pair is getting converted to a wchar, then when encoded to utf8 it’s actually becoming cesu8 instead. Seems that cesu8 ‘works’ on macOS because it print’s to the terminal ok.

My unit test looks like this:

    void testOverwriteWithTextUnicode()
    {
        beginTest ("Overwrite with Text - Unicode");

        auto tempFile = juce::File::getSpecialLocation (juce::File::tempDirectory)
            .getChildFile ("gin_test_" + juce::String::toHexString (juce::Random::getSystemRandom().nextInt()) + ".txt");

        auto testText = juce::String::fromUTF8 ("Hello 世界! Emoji: 🎵");

        expect (overwriteWithText (tempFile, testText, true, true, nullptr),
                "Should write Unicode text successfully");
        expect (tempFile.existsAsFile(), "File should exist");

        juce::String readBack = tempFile.loadFileAsString();
        DBG(testText);
        DBG(readBack);
        expectEquals (readBack, testText, "Unicode text should be preserved");

        tempFile.deleteFile();
    }

In the terminal I see:

Hello 世界! Emoji: 🎵
Hello 世界! Emoji: 🎵

But when I inspect the strings in the debugger I see:

(lldb) p a
(juce::String) $0 = {
  text = (data = "Hello 世界! Emoji: \xed\xa0\xbc\xed\xbe\xb5")
}
(lldb) p b
(juce::String) $1 = {
  text = (data = "Hello 世界! Emoji: 🎵")
}

Thanks again, that should be fixed here:

1 Like