Character encoding in WavAudioFormat

I've been adding some stuff to read the ListInfoChunk in WavAudioFormat since the current version in Juce only seems to write this chunk.  Whilst testing it with a wav file generated by SoundForge, I've noticed that there are some character encoding issues.  In my particular example, a copywrite symbol '©' appears in the reader's metadata StringArray as a close bracket ')'.

The RIFF/Wav specification indicates that the default encoding for text should be ISO 8859/1 unless overridden by the presence of a CSET chunk.  In my case, the CSET chunk is not present, but even if it was then it can only specify alternative code pages (and not utf-8 for example).  I doubt many applications do bother to write it anyway.

The problem is that the WavAudioFormat class treats most text from the wav file as UTF-8 since it tends to call MemoryBlock::toString().

I've kludged in some dirtyness (below) for the time being, on the basis that assuming ISO 8859/1 (ie. not bothering to check for a CSET chunk) is better than a kick up the proverbials.  I've still to handle going back the other way (UTF-8 -> ISO 8859/1 in the file writer) and I'm not sure about what the deal is with AIFF files yet.

Anyway, this is mainly to bring it to your attention - not suggesting it should go into Juce in this manner, probably best in the String classes or tucked away in the AudioFormat class where such dirt can do no harm!

 

// Attempts to parse the contents of the block as a zero terminated ISO-8859-1 string
// The returned string will be UTF-8 encoded.
String MemoryBlock::toStringFromISO8859() const
{
    // Create a second memory block for converting to UTF-8 (worst case, 
    // this may have to be up to twice the size of the ISO-8859-1 string)
    MemoryBlock UTF8Text(size * 2 + 1);
    const u8 *pIn = static_cast<u8*>(getData());        // ISO-8859-1 in
    u8 *pOut = static_cast<u8*>(UTF8Text.getData());    // UTF-8 out
    
    for (size_t i = 0; ((pIn != nullptr) && (*pIn != 0) && (i < size)); ++i)
    {
        if (*pIn < 128)
        {
            *pOut++ = *pIn++;
        }
        else
        {
            *pOut++ = 0xc2 + (*pIn > 0xbf);
            *pOut++ = (*pIn++ & 0x3f) + 0x80;
        }
    }
    
    // Null terminate
    *pOut = 0;
    return String (CharPointer_UTF8 (UTF8Text.data), UTF8Text.size);
}

Yeesh.. Thanks for the heads-up, we'll take a look at that!

I've had a look at the spec for AIFF and it only mentions plain ASCII.   I haven't been able to test this out with a Soundforge generated file because I can't find a way of entering any string metadata in Soundforge which it's willing to save in an AIFF.