String::createStringFromData is really slow for unicode


#1

Did this already come up? I thought somebody else already posted about this, but I couldn’t find the thread. This makes it much faster.

[code] // assume it’s 16-bit unicode
const bool bigEndian = (data[0] == (char)-2);
const int numChars = size / 2 - 1;

	const uint16* const src = (const uint16*) (data + 2);
	uint16* dst = new uint16[numChars];
	
	if (bigEndian)
	{
		for (int i = 0; i < numChars; i++)
			dst[i] = (juce_wchar)swapIfLittleEndian(src[i]);
	}
	else
	{
		for (int i = 0; i < numChars; i++)
			dst[i] = (juce_wchar)swapIfBigEndian(src[i]);
	}			
	String result((juce_wchar*)dst, numChars);

	delete[] dst;
    return result;[/code]

Also, shouldn’t this be the other way around for big endian?

    if (writeUnicodeHeaderBytes)
        write ("\x0ff\x0fe", 2);

#2

Yes, something similar came up recently, I think.

In fact there’s a bit of a bug in your version - not all platforms use a 16-bit value for juce_wchar, so on mac/linux it could go a bit nasty. Try this:

[code] else if ((data[0] == (char)-2 && data[1] == (char)-1)
|| (data[0] == (char)-1 && data[1] == (char)-2))
{
// assume it’s 16-bit unicode
const bool bigEndian = (data[0] == (char)-2);
const int numChars = size / 2 - 1;

    String result;
    result.preallocateStorage (numChars + 2);

    const uint16* const src = (const uint16*) (data + 2);
    tchar* const dst = const_cast <tchar*> ((const tchar*) result);

    if (bigEndian)
    {
        for (int i = 0; i < numChars; ++i)
            dst[i] = (tchar) swapIfLittleEndian (src[i]);
    }
    else
    {
        for (int i = 0; i < numChars; ++i)
            dst[i] = (tchar) swapIfBigEndian (src[i]);
    }

    dst [numChars] = 0;
    return result;
}[/code]

The stuff in OutputStream is also wrong, because it makes the same assumptions about wide char size. (The fact that it’s always little-endian doesn’t make much difference, as long as the encoding is the same way round as the header bytes). I think this version should be more reliable:

[code] if (asUnicode)
{
if (writeUnicodeHeaderBytes)
write ("\x0ff\x0fe", 2);

    const juce_wchar* src = (const juce_wchar*) text;
    bool lastCharWasReturn = false;

    while (*src != 0)
    {
        if (*src == L'\n' && ! lastCharWasReturn)
            writeShort ((short) L'\r');

        lastCharWasReturn = (*src == L'\r');
        writeShort ((short) *src++);
    }
}

[/code]


#3