I've been adding some stuff to read the ListInfoChunk in WavAudioFormat since the current version in Juce only seems to write this chunk. Whilst testing it with a wav file generated by SoundForge, I've noticed that there are some character encoding issues. In my particular example, a copywrite symbol '©' appears in the reader's metadata StringArray as a close bracket ')'.
The RIFF/Wav specification indicates that the default encoding for text should be ISO 8859/1 unless overridden by the presence of a CSET chunk. In my case, the CSET chunk is not present, but even if it was then it can only specify alternative code pages (and not utf-8 for example). I doubt many applications do bother to write it anyway.
The problem is that the WavAudioFormat class treats most text from the wav file as UTF-8 since it tends to call MemoryBlock::toString().
I've kludged in some dirtyness (below) for the time being, on the basis that assuming ISO 8859/1 (ie. not bothering to check for a CSET chunk) is better than a kick up the proverbials. I've still to handle going back the other way (UTF-8 -> ISO 8859/1 in the file writer) and I'm not sure about what the deal is with AIFF files yet.
Anyway, this is mainly to bring it to your attention - not suggesting it should go into Juce in this manner, probably best in the String classes or tucked away in the AudioFormat class where such dirt can do no harm!
// Attempts to parse the contents of the block as a zero terminated ISO-8859-1 string
// The returned string will be UTF-8 encoded.
String MemoryBlock::toStringFromISO8859() const
{
// Create a second memory block for converting to UTF-8 (worst case,
// this may have to be up to twice the size of the ISO-8859-1 string)
MemoryBlock UTF8Text(size * 2 + 1);
const u8 *pIn = static_cast<u8*>(getData()); // ISO-8859-1 in
u8 *pOut = static_cast<u8*>(UTF8Text.getData()); // UTF-8 out
for (size_t i = 0; ((pIn != nullptr) && (*pIn != 0) && (i < size)); ++i)
{
if (*pIn < 128)
{
*pOut++ = *pIn++;
}
else
{
*pOut++ = 0xc2 + (*pIn > 0xbf);
*pOut++ = (*pIn++ & 0x3f) + 0x80;
}
}
// Null terminate
*pOut = 0;
return String (CharPointer_UTF8 (UTF8Text.data), UTF8Text.size);
}
