I’m reading in some old text data (as C-Strings) that was apparently saved in MAC-IS format. (Maybe that’s also know as Roman?) I need to add these strings (preset names) to a StringArray so I can add an itemList to a Combobox.
I am encountering ellipsis specified as ‘\xc9’, and of course when I try to add it to a StringArray I get the assert of:
“you’re trying to create a string from 8-bit data that contains values greater than 127.”
I found this online:
<code_set_name> MAC-IS
<comment_char> %
<escape_char> /
% version: 1.0
CHARMAP
[…snip…]
<U00AB> /xc7 LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
<U00BB> /xc8 RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
<U2026> /xc9 HORIZONTAL ELLIPSIS
<U00A0> /xca NO-BREAK SPACE
<U00C0> /xcb LATIN CAPITAL LETTER A WITH GRAVE
<U00C3> /xcc LATIN CAPITAL LETTER A WITH TILDE
[…snip…]
How can I convert this automatically into a UTF8 or other format that juice::String will accept? I’ve tried various things but nothing’s working. I should mention I’m working on Mac right now…
What I ended up doing, not sure if it’s a great solution or bulletproof, was to manually convert the ellipsis to a 3-byte UTF8 ellipsis and handle any other weird characters by replacing with a space (so I can maybe handle them as well if I have to):
variables:
char theTempName[256];
StringArray menuList;
[snip...]
if (CharPointer_ASCII::isValidString (theTempName, std::numeric_limits<int>::max()))
menuList.add(theTempName);
else
{
char* c;
size_t n, len = strlen (theTempName);
for (n = 0, c = theTempName; n < len; n++, c++)
{
if ((unsigned char) *c == 0xc9) // ellipsis
{
// move the portion after this byte down by 2 bytes
jassert(len < sizeof(theTempName) - 2); // enough room?
memmove(theTempName + n + 3, theTempName + n + 1, len - n);
// replace the 3 bytes with the UTF8 code for ellipsis
*c = '\xe2'; c++;
*c = '\x80'; c++;
*c = '\xa6';
len += 2; // we "inserted" 2 bytes
}
else if ((unsigned char) *c > 127)
{
*c = ' '; // replace others with a space
}
}
menuList.add(CharPointer_UTF8(theTempName));
}
Trying this:
menuList.add(CharPointer_UTF16(theTempName));
… just gives me an error, I tried different things, couldn’t make it work:
No matching conversion for functional-style cast from 'char [256]' to 'juce::CharPointer_UTF16'
CharPointer_UTF16 expects the source to be of 16bit width (uint16_t). I assumed UTF16 from your snippet, but it looks like it could be a non-standard UTF8 variant.
Unicode is a bit annoying; you can store all variants of Unicode text in a char array.
It’s most likely an old 8-bit format where 0-127 is ASCII and 128-255 is whatever they felt like.
UTF-8 only keeps the 0-127. You can either translate the values >= 128 from the original format into their UTF-8 equivalents, or strip them out.
That table you found actually has the mappings from MAC-IS to UTF-16, so you’re already halfway there, just need to implement that table and use UTF-8 instead of UTF-16.